| 注册
首页|期刊导航|计算机技术与发展|基于PMI-IR的联想词表构造方法研究

基于PMI-IR的联想词表构造方法研究

张泽伟 矫健 张仰森

计算机技术与发展Issue(6):140-144,5.
计算机技术与发展Issue(6):140-144,5.DOI:10.3969/j.issn.1673-629X.2014.06.035

基于PMI-IR的联想词表构造方法研究

Research on Approach of Thesaurus Construction Based on PMI-IR

张泽伟 1矫健 1张仰森1

作者信息

  • 1. 北京信息科技大学 计算机学院 智能信息处理研究所,北京 100192
  • 折叠

摘要

Abstract

It has been a hot issue in the field of information retrieval to improve the accuracy of retrieval by mining and analyzing large-scale query logs. A kind of method based on PMI-IR for thesaurus construction is put forward in this paper. The method uses prefix span algorithm to scan the user query log,obtaining these words of which the co-occurrence frequency exceeds a certain threshold,constructing synonym candidate set by cluster. And the similarity of wordA and each candidate word is calculated in turn. These words which are a-bove a certain threshold are selected to construct the synonymy thesaurus. Experimental results show that accuracy can be improved using the thesaurus obtained by the method to extend the search.

关键词

逐点互信息方法/联想词表/查询日志

Key words

PMI-IR/thesaurus/query logs

分类

信息技术与安全科学

引用本文复制引用

张泽伟,矫健,张仰森..基于PMI-IR的联想词表构造方法研究[J].计算机技术与发展,2014,(6):140-144,5.

基金项目

国家自然科学基金资助项目(61070119) (61070119)

北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519) (IDHT20130519)

北京市教委专项基金(PXM2012-014224-000020) (PXM2012-014224-000020)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文