计算机工程2012,Vol.38Issue(22):279-282,4.
基于Lucene的中文分析器分词性能比较研究
Comparison Research of Segmentation Performance for Chinese Analyzers Based on Lucene
摘要
Abstract
The segmentation performance on Chinese analyzer of Lucene is insufficient, and the third party analyzer is difficult to choose. Because of this problem, this paper introduces several kinds of support Lucene analyzer, based on the experiment, sentence segmentation, word segmentation speed, index space and time, retrieval results and speed of retrieval are compared and researched. Analysis results show that, in Lucene framework, Paoding analyzer based on dictionary segmentation has the best overall performance, one-word analyzer of Lucene has the highest segmentation speed, imdict and ICTCLAS4J analyzer have greater room for improvement on the algorithm efficiency.关键词
Lucene框架/搜索引擎/中文分词/分析器/分词速度/索引/检索Key words
Lucene framework/ search engine/ Chinese segmentation/ analyzer/ segmentation speed/ index/ retrieval分类
信息技术与安全科学引用本文复制引用
义天鹏,陈启安..基于Lucene的中文分析器分词性能比较研究[J].计算机工程,2012,38(22):279-282,4.基金项目
航空科学基金资助项目(20085568013) (20085568013)