Lucene.net中文分词算法分析OA北大核心CSTPCD
Analysis on Chinese Segmentation Algorithm of Lucene.net
Lucene.net实现中文分词依靠的是Analyzer类,但通过分析其内置的KeywowordAnalyzer,StandardAnalyzer,StopAnalyzer,SimpleAnalyzer,WhitespaceAnalyzer 5个分词类发现,它们几乎都是按单字的标准进行切分的,为更好处理中文信息,必须引用外部独立开发的中文分词包.在对ChineseAnalyzer,CJKAnalyzer和IKAnalyzer这3种典型的中文分词包…查看全部>>
The segment of Chinese word relies on the Class Analyzer. By analyzing the five built-in analyzers of Lucene. Net, it was found that their segment were based on the single character of KeywordAna-lyzer, StandardAnalyzer, StopAnalyzer, SimpleAnalyzer and WhitespaceAnalyzer. An improted segment kit for a better Chinese information disposal was added. By testing the three typical kits, Chinese Analyzer, CJKAnalyzer and IKAnalyzer, it was found that IKAnalyzer w…查看全部>>
周拴龙
郑州大学信息管理系,河南郑州450001
信息技术与安全科学
Lucene;中文分词;Analyzer类
LuceneChinese word segmentClass Analyzer
《郑州大学学报(理学版)》 2011 (3)
73-77,5
评论