计算机与现代化Issue(9):6-9,14,5.DOI:10.3969/j.issn.1006-2475.2014.09.002
基于改进 TFIDF 算法的文本分类研究
Research on Text Categorization Based on Improved TFIDF Algorithm
郑霖 1徐德华1
作者信息
- 1. 同济大学经济与管理学院,上海 200092
- 折叠
摘要
Abstract
Due to the broad application of text categorization in information retrieval , email filtering, Web page classification , personalized recommendation and other fields , it raised extensive attention among scholars since the concept of text categorization was presented .In text classification research , scholars have adopted a lot of methods , and TFIDF was one of the most commonly used algorithms to calculate the weight of feature items .But the traditional TFIDF algorithm ignored the distribution of feature i-tems within classes and among classes , leading to high weight given to many items with little discrimination .In this paper, with the purpose of improving the traditional TFIDF algorithm , we modified the calculation method of IDF , adding some factors which reflected the distribution of feature items within classes and among classes .In the experiment , we applied the improved TFIDF al-gorithm into text categorization .By investigating the effect of text classification , the improving algorithm was verified valid .关键词
TFIDF算法/特征选择/文本分类Key words
TFIDF algorithm/feature items selection/text categorization分类
信息技术与安全科学引用本文复制引用
郑霖,徐德华..基于改进 TFIDF 算法的文本分类研究[J].计算机与现代化,2014,(9):6-9,14,5.