计算机应用与软件2017,Vol.34Issue(6):27-30,4.DOI:10.3969/j.issn.1000-386x.2017.06.006
融合统计学和TextRank的生物医学文献关键短语抽取
FUSION OF STATISTICS AND TEXTRANK FOR KEYPHRASE EXTRACTION IN BIOMEDICAL LITERATURE
摘要
Abstract
Keyphrase extraction plays a significant role in text clustering, classification, retrieval and so on.This paper uses the classic TF-IDF algorithm to improve the quality of text keyphrase extraction.By studying the TF-IDF algorithm, it is found that the TF-IDF can extract the text keywords by using the single text information and the text collection information.On this basis, this paper proposes a keyphrase extraction method by combining TF-IDF, TextRank, statistical knowledge and inverse document frequency sorting by candidate keyphrase.Based on the TextRank, this method calculates the weight of the words by TF-IDF to get the word score.And then use the statistical knowledge from the previous step to select the phrases of the phrase selected candidate keyphrases.Finally, the candidate keyphrases are sorted by the idea of inverse document frequency.Experiments show that the accuracy of this model is 2% higher than that of classical TextRank model, and the recall rate increased by 4.5% and F-measure increased by 3.4%.关键词
TextRank/关键短语抽取/TF-IDF/逆向文档频率Key words
TextRank/Keyphrase extraction/TF-IDF/Inverse document frequency分类
信息技术与安全科学引用本文复制引用
魏赟,孙先朋..融合统计学和TextRank的生物医学文献关键短语抽取[J].计算机应用与软件,2017,34(6):27-30,4.基金项目
国家自然科学基金项目(61170277) (61170277)
上海市教委科研创新基金项目(12YZ094). (12YZ094)