计算机技术与发展Issue(6):140-144,5.DOI:10.3969/j.issn.1673-629X.2014.06.035
基于PMI-IR的联想词表构造方法研究
Research on Approach of Thesaurus Construction Based on PMI-IR
摘要
Abstract
It has been a hot issue in the field of information retrieval to improve the accuracy of retrieval by mining and analyzing large-scale query logs. A kind of method based on PMI-IR for thesaurus construction is put forward in this paper. The method uses prefix span algorithm to scan the user query log,obtaining these words of which the co-occurrence frequency exceeds a certain threshold,constructing synonym candidate set by cluster. And the similarity of wordA and each candidate word is calculated in turn. These words which are a-bove a certain threshold are selected to construct the synonymy thesaurus. Experimental results show that accuracy can be improved using the thesaurus obtained by the method to extend the search.关键词
逐点互信息方法/联想词表/查询日志Key words
PMI-IR/thesaurus/query logs分类
信息技术与安全科学引用本文复制引用
张泽伟,矫健,张仰森..基于PMI-IR的联想词表构造方法研究[J].计算机技术与发展,2014,(6):140-144,5.基金项目
国家自然科学基金资助项目(61070119) (61070119)
北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519) (IDHT20130519)
北京市教委专项基金(PXM2012-014224-000020) (PXM2012-014224-000020)