Lucene.net中文分词算法分析OA北大核心CSTPCD

Analysis on Chinese Segmentation Algorithm of Lucene.net

中文摘要

英文摘要

Lucene.net实现中文分词依靠的是Analyzer类,但通过分析其内置的KeywowordAnalyzer,StandardAnalyzer,StopAnalyzer,SimpleAnalyzer,WhitespaceAnalyzer 5个分词类发现,它们几乎都是按单字的标准进行切分的,为更好处理中文信息,必须引用外部独立开发的中文分词包.在对ChineseAnalyzer,CJKAnalyzer和IKAnalyzer这3种典型的中文分词包…查看全部>>

The segment of Chinese word relies on the Class Analyzer. By analyzing the five built-in analyzers of Lucene. Net, it was found that their segment were based on the single character of KeywordAna-lyzer, StandardAnalyzer, StopAnalyzer, SimpleAnalyzer and WhitespaceAnalyzer. An improted segment kit for a better Chinese information disposal was added. By testing the three typical kits, Chinese Analyzer, CJKAnalyzer and IKAnalyzer, it was found that IKAnalyzer w…查看全部>>

作者：周拴龙

作者单位：郑州大学信息管理系,河南郑州450001

分类：信息技术与安全科学

中文关键词：Lucene；中文分词；Analyzer类

英文关键词：LuceneChinese word segmentClass Analyzer

刊名：《郑州大学学报（理学版）》 2011 (3)

页码/页数：73-77,5

您当前未登录！

去登录

点击加载更多...

Lucene.net中文分词算法分析OA北大核心CSTPCD

Analysis on Chinese Segmentation Algorithm of Lucene.net

评论