| 注册
首页|期刊导航|计算机与现代化|文本分类中特征权重算法改进研究

文本分类中特征权重算法改进研究

李鹏鹏 范会敏

计算机与现代化Issue(2):66-70,5.
计算机与现代化Issue(2):66-70,5.DOI:10.3969/j.issn.1006-2475.2018.02.014

文本分类中特征权重算法改进研究

Research on Improvement of Feature Weights in Text Classification

李鹏鹏 1范会敏1

作者信息

  • 1. 西安工业大学计算机科学与工程学院,陕西西安710021
  • 折叠

摘要

Abstract

In order to overcome the shortcomings of traditional TF-IDF (Term Frequency Inverse Document Frequency) algorithm,the improved TF-IDF-dist algorithm is proposed by using the distribution of feature words.The experimental results show that the improved algorithm has an average increase of F1 value by 3.2% in the different feature dimensions.With the different feature selection algorithm,the F1 value is increased by 2.75% and the improved TF-IDF-dist algorithm has more adaptability on the imbalance datasets.It shows the validity of the algorithm in text classification.

关键词

机器学习/文本分类/特征权重/TF-IDF

Key words

machine learning/text classification/feature weights/TF-IDF

分类

信息技术与安全科学

引用本文复制引用

李鹏鹏,范会敏..文本分类中特征权重算法改进研究[J].计算机与现代化,2018,(2):66-70,5.

基金项目

陕西省科技厅工业攻关项目(2017GY-070) (2017GY-070)

计算机与现代化

OACSTPCD

1006-2475

访问量0
|
下载量0
段落导航相关论文