首页|期刊导航|计算机应用与软件|基于Naive Bayes的维吾尔文文本分类算法及其性能分析

基于Naive Bayes的维吾尔文文本分类算法及其性能分析

艾海麦提江·阿布来提吐尔地·托合提艾斯卡尔·艾木都拉

计算机应用与软件2012，Vol.29Issue(12)：27-29,3.

计算机应用与软件2012，Vol.29Issue(12)：27-29,3.DOI:10.3969/j.issn.1000-386x.2012.12.008

基于Naive Bayes的维吾尔文文本分类算法及其性能分析

UYGHUR TEXT CLASSIFICATION BASED ON NAIVE BAYES AND ITS PERFORMANCE ANALYSIS

艾海麦提江·阿布来提 ¹吐尔地·托合提 ¹艾斯卡尔·艾木都拉¹

作者信息

1. 新疆大学信息科学与工程学院新疆乌鲁木齐830046
折叠

摘要

Abstract

In this paper, taking the automatic classification of large-scale Uyghur text collected from the network as the research background, we have designed the Uyghur text classification system with modular structure, and based on through investigations, we chose the Naive Bayes algorithm as the classification engine, and have implemented the classification system using C-sharp. In the preprocessing part, combining with the lexical characteristics of Uyghur language and by introducing the stem extraction method into the procedure, we have greatly reduced the whole feature dimensions. The classification experimental results on the basis of large-scale text corpus includes more than 3000 documents which are belongs to different 10 categories are given, and the results of the classification experiments for different number of features selected by using x2 statistical method are also given respectively. Results show that only 1% to 3% of the features in Uyghur feature space are critical, so it is possible to determine which ones are the best features or to further reduce the feature space dimensions.

关键词

维吾尔文/文本分类/Naive Bayes/词干提取/停用词

Key words

Uyghur/Text classification/Naive Bayes/Stem Extract/Stop words

分类

信息技术与安全科学

引用本文复制引用

艾海麦提江·阿布来提,吐尔地·托合提,艾斯卡尔·艾木都拉..基于Naive Bayes的维吾尔文文本分类算法及其性能分析[J].计算机应用与软件,2012,29(12):27-29,3.

基金项目

国家自然科学基金项目(61063022,61163033). （61063022,61163033）

计算机应用与软件

OA北大核心CSCDCSTPCD

ISSN：1000-386X

访问量0

下载量0

段落导航