首页|期刊导航|计算机应用与软件|融合统计学和TextRank的生物医学文献关键短语抽取

融合统计学和TextRank的生物医学文献关键短语抽取

魏赟孙先朋

计算机应用与软件2017，Vol.34Issue(6)：27-30,4.

计算机应用与软件2017，Vol.34Issue(6)：27-30,4.DOI:10.3969/j.issn.1000-386x.2017.06.006

融合统计学和TextRank的生物医学文献关键短语抽取

FUSION OF STATISTICS AND TEXTRANK FOR KEYPHRASE EXTRACTION IN BIOMEDICAL LITERATURE

魏赟 ¹孙先朋¹

作者信息

1. 上海理工大学光电信息与计算机工程学院上海 200093
折叠

摘要

Abstract

Keyphrase extraction plays a significant role in text clustering, classification, retrieval and so on.This paper uses the classic TF-IDF algorithm to improve the quality of text keyphrase extraction.By studying the TF-IDF algorithm, it is found that the TF-IDF can extract the text keywords by using the single text information and the text collection information.On this basis, this paper proposes a keyphrase extraction method by combining TF-IDF, TextRank, statistical knowledge and inverse document frequency sorting by candidate keyphrase.Based on the TextRank, this method calculates the weight of the words by TF-IDF to get the word score.And then use the statistical knowledge from the previous step to select the phrases of the phrase selected candidate keyphrases.Finally, the candidate keyphrases are sorted by the idea of inverse document frequency.Experiments show that the accuracy of this model is 2% higher than that of classical TextRank model, and the recall rate increased by 4.5% and F-measure increased by 3.4%.

关键词

TextRank/关键短语抽取/TF-IDF/逆向文档频率

Key words

TextRank/Keyphrase extraction/TF-IDF/Inverse document frequency

分类

信息技术与安全科学

引用本文复制引用

魏赟,孙先朋..融合统计学和TextRank的生物医学文献关键短语抽取[J].计算机应用与软件,2017,34(6):27-30,4.

基金项目

国家自然科学基金项目(61170277) （61170277）

上海市教委科研创新基金项目(12YZ094). （12YZ094）

计算机应用与软件

OA北大核心CSTPCD

ISSN：1000-386X

访问量0

下载量0

段落导航