Individual Project

2019-11-17 02:51:55

字体：大中小

来源：转载

供稿：网友

Individual PRoject - Word_frequency

作业说明详见：http://www.VEVb.com/jiel/p/3978727.html

一.预先准备和时间规划

1.安装Microsoft Visual Studio Ultimate 2012，之前安装过一次，预计2小时左右，但是安装过程中可以继续进行其它任务。

2.阅读题目要求，理解要实现的功能，预计20min左右。

3.根据题目，设计程序框架，预计10min左右。

4.阅读相关文档，学习所需要的类的命名空间，属性和方法，预计1小时左右。

5.初步完成程序编写，预计3小时左右。

6.设计测试数据，对出现的Bug进行改进，预计2小时左右。

二.实际用时和具体过程

1.安装Microsoft Visual Studio Ultimate 2012，网址如下：

　　http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=30678

网速较快用时1小时。

2.阅读题目要求，理解要实现的功能：实现一个文本词频统计器，可以实现统计单词数目，2联短语数目，3联短语数目。单词比较忽略大小写，输出时按照词频从大到小排序，相等时按字典序排序，用时15min。

3.根据题目，设计程序框架，完善代码：（用时大约4小时）

1 String[] files = Directory.GetFiles(path);

View Code

　　（2）读取每个文件的内容：

 1 foreach (String i in files) 2 { 3     String extension = i.Substring(i.LastIndexOf(".")+1, i.Length - i.LastIndexOf(".")-1); 4     if (!extension.Equals("txt") && 5         !extension.Equals("cpp") && 6         !extension.Equals("h") && 7         !extension.Equals("cs")) continue;//只处理特定格式的文件 8     if (i.Equals(path+"//"+"12061162.txt")) continue; 9     String[] text = rgxwords.Split(File.ReadAllText(i));10 }

View Code

　　（3）对于获取的文本信息整理：先将文本拆分成若干行或者句子，设定三个Regex类，分别匹配单个单词，2两个由单个空格隔开的单词，3个由单个单词隔开的单词。调用Regex.Match()方法和Regex.NextMatch方法匹配所有可匹配项：

　　　　数据存放数组定义如下：

1 ArrayList data = new ArrayList();//当前小说的单词数据2 ArrayList word_word = new ArrayList();//e2模式短语的数据3 ArrayList word_word_word = new ArrayList();//e3模式短语的数据

View Code

　　　　模板定义如下：

1 Regex rgxwords = new Regex("[/n/r,.//(//)//{//}//{//]:/"!;]+");//将文本拆分成一句或一行一个2 Regex regword1 = new Regex("[a-zA-Z]{3}[0-9a-zA-Z]*");//Simple mode的模式3 Regex regword2 = new Regex("[a-zA-Z]{3}[0-9a-zA-Z]* [a-zA-Z]{3}[0-9a-zA-Z]*");//Extended mode2的模式4 Regex regword3 = new Regex("[a-zA-Z]{3}[0-9a-zA-Z]* [a-zA-Z]{3}[0-9a-zA-Z]* [a-zA-Z]{3}[0-9a-zA-Z]*");//Extended mode3的模式

View Code

　　　　具体过程如下：

 1 Match match = regword1.Match(text[j], 0); 2 while (match.Success) 3 { 4     data.Add(new Data(match.Value, 1));//simple mode的数据更新 5     match = match.NextMatch(); 6 } 7 Match match2 = regword2.Match(text[j], 0); 8  Match match1 = regword1.Match(text[j],match2.Index); 9 while (match2.Success)10 {11     word_word.Add(new Data(match2.Value, 1));12     match2 = regword2.Match(text[j], match1.Index+1);13     match1 = match1.NextMatch();14 }//extend mode2的数据更新15 match2 = regword3.Match(text[j], 0);16 match1 = regword1.Match(text[j],match2.Index);17 while (match2.Success)18 {19     word_word_word.Add(new Data(match2.Value, 1));20     match2 = regword3.Match(text[j], match1.Index + 1);21     match1 = match1.NextMatch();22 }//extend mode3的数据更新

View Code

　　　（4）对于存取数据的ArrayList类整理排序：主要运用ArrayList.sort(IComparer)方法，需要自己实现IComparer接口。

 1         class myReverserClass1 : IComparer 2         //自定义比较器,用于字典序排序 3         { 4             int MyStringCompare(String x, String y) 5             //自定义了字符串比较方法： 6             //忽略大小写排序,但是大写相对靠前 7             //如hello,world,World,zoo 8             //排序后变成hello,World,world,zoo 9             {10                 int lx = x.Count(), ly = y.Count(), i;11                 String xx = x.ToUpper();12                 String yy = y.ToUpper();13                 for (i = 0; i < lx && i < ly; i++)14                     if (xx[i] == yy[i]) continue;15                     else return xx[i] - yy[i];16                 if (i == lx && i < ly) return -1;17                 else if (i < lx && i == ly) return 1;18                 else19                 {20                     for (i = 0; i < lx && i < ly; i++)21                         if (x[i] == y[i]) continue;22                         else return y[i] - x[i];23                     return 0;24                 }25             }26             int IComparer.Compare(Object x, Object y)27             {28                 return MyStringCompare(((Data)y).word, ((Data)x).word);29             }30         }31         class myReverserClass2 : IComparer32         //自定义比较器,用于单词频率排序33         {34             int MyStringCompare(String x, String y)35             //自定义了字符串比较方法：36             //忽略大小写排序,但是大写相对靠前37             //如hello,world,World,zoo38             //排序后变成hello,World,world,zoo39             {40                 int lx = x.Count(), ly = y.Count(), i;41                 String xx = x.ToUpper();42                 String yy = y.ToUpper();43                 for (i = 0; i < lx && i < ly; i++)44                     if (xx[i] == yy[i]) continue;45                     else return xx[i] - yy[i];46                 if (i == lx && i < ly) return -1;47                 else if (i < lx && i == ly) return 1;48                 else49                 {50                     for (i = 0; i < lx && i < ly; i++)51                         if (x[i] == y[i]) continue;52                         else return y[i] - x[i];53                     return 0;54                 }55             }56             int IComparer.Compare(Object x, Object y)57             {58                 if (((Data)x).num > ((Data)y).num) return -1;59                 else if (((Data)x).num < ((Data)y).num) return 1;60                 else return MyStringCompare(((Data)x).word, ((Data)y).word);61             }62         }

View Code

　　　设计相关方法去掉重复单词，记录次数：

上一篇：C#的new操作符到底做了什么

下一篇：判断一个整数是否为另一个整数的幂数