计算机工程2019,Vol.45Issue(3):26-31,6.DOI:10.19678/j.issn.1000-3428.0049976
基于Hadoop平台的相关性权重算法设计与实现
Design and Implementation of Correlation Weight Algorithm Based on Hadoop Platform
摘要
Abstract
The traditional TF-IDF algorithm calculates the correlation weights between keywords and documents only by using the perspective of word frequency and reverse document frequency, which ignoes the influence of user interest on weight calculation.In order to meet the purpose of user information retrieval, a correlation weight algorithm based on journal association is proposed.From the perspective of user-oriented comelation, the user interest model is built by analyzing the user's search journal, and combined with the idea of distributed computing, the MapReduce programming framework is used to realize the parallel processing of computing tasks.Experimental results show that it can not only improve the efficiency of the algorithm when dealing with massive data, but also dynamically change the weight of retrieval word according to the user's historical retrieval records, so as to enhance the interaction ability between users and the system.关键词
分布式计算/TF-IDF算法/日志/兴趣模型/信息检索Key words
distributed computing/TF-IDF algorithm/journal/interest model/information retrieval分类
信息技术与安全科学引用本文复制引用
高军,黄献策..基于Hadoop平台的相关性权重算法设计与实现[J].计算机工程,2019,45(3):26-31,6.基金项目
国家自然科学基金(41701523) (41701523)
上海海事大学研究生创新基金(YXR2017032). (YXR2017032)