| 注册
首页|期刊导航|计算机工程|基于Lucene的中文分析器分词性能比较研究

基于Lucene的中文分析器分词性能比较研究

义天鹏 陈启安

计算机工程2012,Vol.38Issue(22):279-282,4.
计算机工程2012,Vol.38Issue(22):279-282,4.

基于Lucene的中文分析器分词性能比较研究

Comparison Research of Segmentation Performance for Chinese Analyzers Based on Lucene

义天鹏 1陈启安1

作者信息

  • 1. 厦门大学计算机科学系,福建厦门361005
  • 折叠

摘要

Abstract

The segmentation performance on Chinese analyzer of Lucene is insufficient, and the third party analyzer is difficult to choose. Because of this problem, this paper introduces several kinds of support Lucene analyzer, based on the experiment, sentence segmentation, word segmentation speed, index space and time, retrieval results and speed of retrieval are compared and researched. Analysis results show that, in Lucene framework, Paoding analyzer based on dictionary segmentation has the best overall performance, one-word analyzer of Lucene has the highest segmentation speed, imdict and ICTCLAS4J analyzer have greater room for improvement on the algorithm efficiency.

关键词

Lucene框架/搜索引擎/中文分词/分析器/分词速度/索引/检索

Key words

Lucene framework/ search engine/ Chinese segmentation/ analyzer/ segmentation speed/ index/ retrieval

分类

信息技术与安全科学

引用本文复制引用

义天鹏,陈启安..基于Lucene的中文分析器分词性能比较研究[J].计算机工程,2012,38(22):279-282,4.

基金项目

航空科学基金资助项目(20085568013) (20085568013)

计算机工程

OACSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文