计算机应用研究2011,Vol.28Issue(11):4092-4096,5.DOI:10.3969/j.issn.1001-3695.2011.11.024
动态自适应特征权重的多类文本分类算法研究
Research on dynamic self-adaptive term weighting for multi-class text classification algorithm
摘要
Abstract
Text classification plays an important role while studying text data mining and information retrieve, and computing and allocating term weight is the key process while classifying text. Therefore, this paper proposed a dynamic self-adaptive term weighting (DATW) for multi-class text classification, which overcame the disadvantages of the traditional term weighting algorithm TF-1DF. DATW not only considered the term frequency within a text and the number of a text corresponding the term within the whole training set, but also took into account the distribution coefficient and the gradient descent of a term to self-adapting dynamic text classification. It is validated that the performance of using DATW is superior to that of using TF-IDF.关键词
文本分类/特征权重/TF-IDF/分散度/梯度差Key words
text classification/ term weighting/ TF-IDF/ distribution coefficient/ gradient descent分类
信息技术与安全科学引用本文复制引用
裴颂文,吴百锋..动态自适应特征权重的多类文本分类算法研究[J].计算机应用研究,2011,28(11):4092-4096,5.基金项目
上海市教委优秀青年教师科研基金资助项目(SLG10005) (SLG10005)
上海理工大学科研创新基金资助项目(GDCX-Y-102) (GDCX-Y-102)
AMD大学合作计划专项基金资助项目(BOW-02) (BOW-02)