计算机技术与发展2011,Vol.21Issue(7):29-31,35,4.
基于字分类的中文分词的研究
Chinese Word Segmentation Research Based on Classification of Words
摘要
Abstract
Chinese word segmentation is the premise and foundation of natural language processing, which is realized by mutual statistics principles. Imagining Chinese word segmentation as the process of characters classification and putting a character into certain context,the category of the character can be identified. Based on mutual statistics principles, classified characters into four categories: a character connects with the left one, a character connects with the right one, a character in the middle of the other two and an independent character. Applying to t-test algorithm in the process of segmentation, some ambiguity problems are solved. Taking People Daily as the corpus of training and testing, this experiment shows that ambiguity problems are better solved and the accuracy of word segmentation reached 90.3% and improved significantly.关键词
中文分词/互信息/t-测试/分类Key words
Chinese word segmentation/ mutual information/ t-test/ categorization分类
信息技术与安全科学引用本文复制引用
韩月阳,邓世昆,贾时银,李远方..基于字分类的中文分词的研究[J].计算机技术与发展,2011,21(7):29-31,35,4.基金项目
云南省自然科学基金(2007F174M) (2007F174M)
云南大学研究生科研课题资助项目(ynny200928) (ynny200928)