计算机与现代化Issue(2):66-70,5.DOI:10.3969/j.issn.1006-2475.2018.02.014
文本分类中特征权重算法改进研究
Research on Improvement of Feature Weights in Text Classification
摘要
Abstract
In order to overcome the shortcomings of traditional TF-IDF (Term Frequency Inverse Document Frequency) algorithm,the improved TF-IDF-dist algorithm is proposed by using the distribution of feature words.The experimental results show that the improved algorithm has an average increase of F1 value by 3.2% in the different feature dimensions.With the different feature selection algorithm,the F1 value is increased by 2.75% and the improved TF-IDF-dist algorithm has more adaptability on the imbalance datasets.It shows the validity of the algorithm in text classification.关键词
机器学习/文本分类/特征权重/TF-IDFKey words
machine learning/text classification/feature weights/TF-IDF分类
信息技术与安全科学引用本文复制引用
李鹏鹏,范会敏..文本分类中特征权重算法改进研究[J].计算机与现代化,2018,(2):66-70,5.基金项目
陕西省科技厅工业攻关项目(2017GY-070) (2017GY-070)