计算机应用与软件2012,Vol.29Issue(12):27-29,3.DOI:10.3969/j.issn.1000-386x.2012.12.008
基于Naive Bayes的维吾尔文文本分类算法及其性能分析
UYGHUR TEXT CLASSIFICATION BASED ON NAIVE BAYES AND ITS PERFORMANCE ANALYSIS
摘要
Abstract
In this paper, taking the automatic classification of large-scale Uyghur text collected from the network as the research background, we have designed the Uyghur text classification system with modular structure, and based on through investigations, we chose the Naive Bayes algorithm as the classification engine, and have implemented the classification system using C-sharp. In the preprocessing part, combining with the lexical characteristics of Uyghur language and by introducing the stem extraction method into the procedure, we have greatly reduced the whole feature dimensions. The classification experimental results on the basis of large-scale text corpus includes more than 3000 documents which are belongs to different 10 categories are given, and the results of the classification experiments for different number of features selected by using x2 statistical method are also given respectively. Results show that only 1% to 3% of the features in Uyghur feature space are critical, so it is possible to determine which ones are the best features or to further reduce the feature space dimensions.关键词
维吾尔文/文本分类/Naive Bayes/词干提取/停用词Key words
Uyghur/Text classification/Naive Bayes/Stem Extract/Stop words分类
信息技术与安全科学引用本文复制引用
艾海麦提江·阿布来提,吐尔地·托合提,艾斯卡尔·艾木都拉..基于Naive Bayes的维吾尔文文本分类算法及其性能分析[J].计算机应用与软件,2012,29(12):27-29,3.基金项目
国家自然科学基金项目(61063022,61163033). (61063022,61163033)