集成技术2016,Vol.5Issue(2):97-108,12.
Geeking:基于胜者表的体育新闻搜索引擎系统
Geeking:a Sports News Search Engine System Based on Champion List
摘要
Abstract
In this paper, a sports news search engine, Geeking, was introduced, which contains four functional models: web crawling, champion list building, search processing and user interface. Geeking could provide query correction, query auto-completion, search results sorting, news clustering, keywords highlighting and snapshot visualization. Given a query, the system automatically completes the query according to the search logs and the news hot keywords. If there was no return of result, the system could correct the query and provided the recommended query terms. The related documents were searched quickly according to the champion list. Based on the tf-idf values and other factors like news headlines and release time, the documents’ relevance was calculated. For the clustering of similar news, the longest common subsequence and levenshtein distance were used to measure the similarity between news headlines and the similarity of news headlines could be regarded as the similarity between documents. Test results were given to show that Geeking is fast and stable.关键词
搜索引擎/体育新闻/胜者表/编辑距离/聚类/查询词校正Key words
search engine/sports news/champion list/levenshtein distance/clustering/query term correction分类
信息技术与安全科学引用本文复制引用
林裕杰,陈新荃,高妍,肖卡飞,胡红祥,花强..Geeking:基于胜者表的体育新闻搜索引擎系统[J].集成技术,2016,5(2):97-108,12.基金项目
国家自然科学基金(61433012, U1435215,11171086);河北省自然科学基金(F2013201064) (61433012, U1435215,11171086)