| 注册
首页|期刊导航|集成技术|Geeking:基于胜者表的体育新闻搜索引擎系统

Geeking:基于胜者表的体育新闻搜索引擎系统

林裕杰 陈新荃 高妍 肖卡飞 胡红祥 花强

集成技术2016,Vol.5Issue(2):97-108,12.
集成技术2016,Vol.5Issue(2):97-108,12.

Geeking:基于胜者表的体育新闻搜索引擎系统

Geeking:a Sports News Search Engine System Based on Champion List

林裕杰 1陈新荃 2高妍 3肖卡飞 4胡红祥 5花强1

作者信息

  • 1. (中国科学院深圳先进技术研究院深圳 518055)
  • 2. (中国科学院大学深圳先进技术学院 深圳 518055)
  • 3. (中国科学院上海高等研究院 上海 201210)
  • 4. (中国科学院计算技术研究所 北京 100190)
  • 5. (中国科学院沈阳计算技术研究所 沈阳 110168)
  • 折叠

摘要

Abstract

In this paper, a sports news search engine, Geeking, was introduced, which contains four functional models: web crawling, champion list building, search processing and user interface. Geeking could provide query correction, query auto-completion, search results sorting, news clustering, keywords highlighting and snapshot visualization. Given a query, the system automatically completes the query according to the search logs and the news hot keywords. If there was no return of result, the system could correct the query and provided the recommended query terms. The related documents were searched quickly according to the champion list. Based on the tf-idf values and other factors like news headlines and release time, the documents’ relevance was calculated. For the clustering of similar news, the longest common subsequence and levenshtein distance were used to measure the similarity between news headlines and the similarity of news headlines could be regarded as the similarity between documents. Test results were given to show that Geeking is fast and stable.

关键词

搜索引擎/体育新闻/胜者表/编辑距离/聚类/查询词校正

Key words

search engine/sports news/champion list/levenshtein distance/clustering/query term correction

分类

信息技术与安全科学

引用本文复制引用

林裕杰,陈新荃,高妍,肖卡飞,胡红祥,花强..Geeking:基于胜者表的体育新闻搜索引擎系统[J].集成技术,2016,5(2):97-108,12.

基金项目

国家自然科学基金(61433012, U1435215,11171086);河北省自然科学基金(F2013201064) (61433012, U1435215,11171086)

集成技术

2095-3135

访问量0
|
下载量0
段落导航相关论文