首页|期刊导航|郑州大学学报(理学版)|Lucene.net中文分词算法分析

Lucene.net中文分词算法分析OA北大核心CSTPCD

Analysis on Chinese Segmentation Algorithm of Lucene.net

中文摘要英文摘要

Lucene.net实现中文分词依靠的是Analyzer类,但通过分析其内置的KeywowordAnalyzer,StandardAnalyzer,StopAnalyzer,SimpleAnalyzer,WhitespaceAnalyzer 5个分词类发现,它们几乎都是按单字的标准进行切分的,为更好处理中文信息,必须引用外部独立开发的中文分词包.在对ChineseAnalyzer,CJKAnalyzer和IKAnalyzer这3种典型的中文分词包…查看全部>>

The segment of Chinese word relies on the Class Analyzer. By analyzing the five built-in analyzers of Lucene. Net, it was found that their segment were based on the single character of KeywordAna-lyzer, StandardAnalyzer, StopAnalyzer, SimpleAnalyzer and WhitespaceAnalyzer. An improted segment kit for a better Chinese information disposal was added. By testing the three typical kits, Chinese Analyzer, CJKAnalyzer and IKAnalyzer, it was found that IKAnalyzer w…查看全部>>

周拴龙

郑州大学信息管理系,河南郑州450001

信息技术与安全科学

Lucene;中文分词;Analyzer类

LuceneChinese word segmentClass Analyzer

《郑州大学学报(理学版)》 2011 (3)

73-77,5

评论

您当前未登录!去登录点击加载更多...