| 注册
首页|期刊导航|数字图书馆论坛|基于改进TF-IDF-CHI算法的农业科技文献文本特征抽取

基于改进TF-IDF-CHI算法的农业科技文献文本特征抽取

杜若鹏 鲜国建 寇远涛

数字图书馆论坛Issue(8):18-24,7.
数字图书馆论坛Issue(8):18-24,7.DOI:10.3772/j.issn.1673-2286.2019.08.003

基于改进TF-IDF-CHI算法的农业科技文献文本特征抽取

Improvement and Application of TF-IDF-CHI in Agricultural Science Text Feature Extraction

杜若鹏 1鲜国建 1寇远涛1

作者信息

  • 1. 中国农业科学院农业信息研究所/农业农村部农业大数据重点实验室,北京 100081
  • 折叠

摘要

Abstract

This paper is aimed at improving the lack of traditional TF-IDF method and verifying its effectiveness through text classification tests in the agricultural field. The improved method is called ImpTF_IDF_CHI which is to reconstruct the feature word weighting function by adding chi-square test values and weight correction factors. First, we use the ImpTF-IDF-CHI method, document frequency method, information gain method and the TF-IDF to perform the feature word extraction test. Then we use feature extraction words for test of text classification and judge the pros and cons based on the test. In all the test results, the best results were obtained using the ImpTF-IDF-CHI method. The Accuracy of naive Bayesian text classification using the ImpTF-IDF-CHI method is 94% and F1 value is 0.844. The experiment fully proves the effectiveness and advancement of the ImpTF-IDF-CHI method. The ImpTF-IDF-CHI method has the characteristics of high accuracy, good stability, strong subject representative in text feature extraction. This method can be applied to fields such as text categorization, feature expression and theme extraction.

关键词

特征抽取/TF-IDF/卡方统计/文本分类/农业科技文献

Key words

Feature Extraction/TF-IDF/Chi-Square Statistics/Text Categorization/Agricultural Science

分类

信息技术与安全科学

引用本文复制引用

杜若鹏,鲜国建,寇远涛..基于改进TF-IDF-CHI算法的农业科技文献文本特征抽取[J].数字图书馆论坛,2019,(8):18-24,7.

基金项目

本研究得到国家社会科学基金项目"科技论文全景式摘要知识图谱构建与应用研究"(编号:19BTQ61)、中国农业科学院科技创新工程项目(编号:CAAS-ASTIP-2016-AII)和中国工程科技知识中心建设项目(编号:CKCEST-2018-1-15)资助. (编号:19BTQ61)

数字图书馆论坛

OACSSCICSTPCD

1673-2286

访问量0
|
下载量0
段落导航相关论文