首页|期刊导航|计算机技术与发展|基于字分类的中文分词的研究

基于字分类的中文分词的研究

韩月阳邓世昆贾时银李远方

计算机技术与发展2011，Vol.21Issue(7)：29-31,35,4.

基于字分类的中文分词的研究

Chinese Word Segmentation Research Based on Classification of Words

韩月阳 ¹邓世昆 ¹贾时银 ¹李远方¹

作者信息

1. 云南大学信息学院,云南,昆明,650091
折叠

摘要

Abstract

Chinese word segmentation is the premise and foundation of natural language processing, which is realized by mutual statistics principles. Imagining Chinese word segmentation as the process of characters classification and putting a character into certain context,the category of the character can be identified. Based on mutual statistics principles, classified characters into four categories: a character connects with the left one, a character connects with the right one, a character in the middle of the other two and an independent character. Applying to t-test algorithm in the process of segmentation, some ambiguity problems are solved. Taking People Daily as the corpus of training and testing, this experiment shows that ambiguity problems are better solved and the accuracy of word segmentation reached 90.3％ and improved significantly.

关键词

中文分词/互信息/t-测试/分类

Key words

Chinese word segmentation/ mutual information/ t-test/ categorization

分类

信息技术与安全科学

引用本文复制引用

韩月阳,邓世昆,贾时银,李远方..基于字分类的中文分词的研究[J].计算机技术与发展,2011,21(7):29-31,35,4.

基金项目

云南省自然科学基金(2007F174M) （2007F174M）

云南大学研究生科研课题资助项目(ynny200928) （ynny200928）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航