数据采集与处理2017,Vol.32Issue(3):516-522,7.DOI:10.16337/j.1004-9037.2017.03.010
一种面向文本分类的特征迁移方法
Feature Transfer Learning for Text Categorization
摘要
Abstract
Traditional text classification methods assume that feature words in the training set and test set follow the same probability distribution.Nevertheless,deviations exist in a practical application,which can affect the final classification results.To solve the problem,a feature transfer learning algorithm for text categorization is proposed.By calculating the transfer volume and amending the vector space model in the training set,the distribution probability of feature words can be reconciled for the training set and test set.Experiments on Chinese spam filtering and web page classification data sets demonstrate that the proposed method can eliminate the dissimilarity of distributions of feature words,and improve the various indexes of test classification evidently.关键词
文本分类/迁移学习/迁移量/向量空间模型Key words
text categorization/transfer learning/transfer volume/vector space model分类
信息技术与安全科学引用本文复制引用
赵世琛,王文剑..一种面向文本分类的特征迁移方法[J].数据采集与处理,2017,32(3):516-522,7.基金项目
国家自然科学基金(60975035,61273291)资助项目 (60975035,61273291)
山西省回国留学人员科研基金(2012008)资助项目. (2012008)