郑州大学学报(理学版)2011,Vol.43Issue(3):73-77,5.
Lucene.net中文分词算法分析
Analysis on Chinese Segmentation Algorithm of Lucene.net
周拴龙1
作者信息
- 1. 郑州大学信息管理系,河南郑州450001
- 折叠
摘要
Abstract
The segment of Chinese word relies on the Class Analyzer. By analyzing the five built-in analyzers of Lucene. Net, it was found that their segment were based on the single character of KeywordAna-lyzer, StandardAnalyzer, StopAnalyzer, SimpleAnalyzer and WhitespaceAnalyzer. An improted segment kit for a better Chinese information disposal was added. By testing the three typical kits, Chinese Analyzer, CJKAnalyzer and IKAnalyzer, it was found that IKAnalyzer which uses Dictionary participle and the positive and negative two-way search method, worked well.关键词
Lucene;中文分词;Analyzer类Key words
Lucene/Chinese word segment/Class Analyzer分类
信息技术与安全科学引用本文复制引用
周拴龙..Lucene.net中文分词算法分析[J].郑州大学学报(理学版),2011,43(3):73-77,5.