计算机工程2018,Vol.44Issue(3):189-194,6.DOI:10.3969/j.issn.1000-3428.2018.03.032
基于查询意图识别与主题建模的文档检索算法
Document Retrieval Algorithm Based on Query Intent Identification and Topic Modeling
摘要
Abstract
Conventional search engines collect documents which only contain key words in the query,but not considering the true intent hidden inside its users.Aiming at this problem,taking the document retrieval as a personalized recommendation problem,this paper proposes a personalized retrieval algorithm based on query intent identification and topic model.First,the topic model of Dirichlet Distribution Allocation(LDA) is applied for modeling the historical search data of its user.When a new query comes,latent topic of the query is recognized by the topic model of the historical search of its user,and then appropriate documents are recommended for the correlation of topics.Finally,the KL distance between the query and document sets is calculated,and the documents returning to the user are sorted according to the distance.Experimental results show that the proposed algorithm is better than the method based on collaborative similarity calculation and the method based on user interest clustering on efficiency.关键词
搜索引擎/查询意图/文档检索/个性化推荐/主题模型/潜在狄利克雷分布/KL距离Key words
search engine/query intent/document retrieval/personalized recommendation/topic model/Latent Dirichlet Allocation (LDA)/KL distance分类
信息技术与安全科学引用本文复制引用
严锐,李石君..基于查询意图识别与主题建模的文档检索算法[J].计算机工程,2018,44(3):189-194,6.基金项目
国家自然科学基金(61272109) (61272109)
国家自然科学青年基金(61502350). (61502350)