首页|期刊导航|广西师范大学学报（自然科学版）|基于查询日志分析的中文网页关键词抽取方法

基于查询日志分析的中文网页关键词抽取方法

王晓艳王珍珍

广西师范大学学报（自然科学版）Issue(2)：42-48,7.

广西师范大学学报（自然科学版）Issue(2)：42-48,7.DOI:10.16088/j.issn.1001-6600.2015.02.007

基于查询日志分析的中文网页关键词抽取方法

Chinese Page Keyword Extraction Method Based on Query Log Analysis

王晓艳 ¹王珍珍²

作者信息

1. 福建师范大学协和学院，福建福州 350117
2. 福建师范大学经济学院，福建福州 350108
折叠

摘要

Abstract

The webpage search engine based on the full-text index provides low correlation.To solve this problem,this paper proposes a keyword extraction method for Chinese pages based on query log analysis.The method selects keywords according to users’j udgment of relevance on the page and query words.In order to quantify the relevance j udgment,three indexes,such as residence time per unit length, inverted click rate and rank compensation factor, are proposed of which are then comprehensively weighted.In this paper,these processes,such as query string segmentation,synonym recognition,polysemy disambiguation, keyphrase matching, are specially treated.The experiment results show that the precision rate is high,and the comprehensive performance is better than that of the TF.IDF method and the SVM method.The proposed method can obtain satisfactory effect of the keyword extraction.

关键词

查询日志/抽取/关键短语组配/同义词识别/多义词消歧

Key words

query log/keyword extraction/keyphrase matching/synonym recognition/polysemy disam-biguation

分类

社会科学

引用本文复制引用

王晓艳,王珍珍..基于查询日志分析的中文网页关键词抽取方法[J].广西师范大学学报（自然科学版）,2015,(2):42-48,7.

基金项目

国家社会科学基金资助项目(14CJL001) （14CJL001）

广西师范大学学报（自然科学版）

OA北大核心CSTPCD

ISSN：1001-6600

访问量0

下载量0

段落导航