计算机工程与科学2017,Vol.39Issue(2):399-404,6.DOI:10.3969/j.issn.1007-130X.2017.02.027
融合词语类别特征和语义的短文本分类方法
A short text classification method combining lexical category features and semantics
摘要
Abstract
Classification of short texts is challenging due to their typical characteristics of severe sparseness and high dimension.We propose a novel approach to classify short texts by combining both lexical and semantic features.To construct the term dictionary,we firstly select lexical features of the most distinctive words of a certain category,and then extract the optimal topic distribution from the background knowledge repository based on the Latent Dirichlet Allocation so as to construct the new features of short texts.Experiments on classifying Sohu news titles which are typical short texts via SVM and K-NN show that our method can greatly improve the classification results.关键词
短文本分类/隐含狄利克雷分布/词汇特征/语义特征/特征选择Key words
short text classification/Latent Dirichlet Allocation/lexical features/semantic features/feature selection分类
信息技术与安全科学引用本文复制引用
马慧芳,周汝南,吉余岗,鲁小勇..融合词语类别特征和语义的短文本分类方法[J].计算机工程与科学,2017,39(2):399-404,6.基金项目
国家自然科学基金(61163039,61363058) (61163039,61363058)
甘肃省青年科技基金(145RJYA259) (145RJYA259)
甘肃省自然科学研究基金(145RJZA232) (145RJZA232)
西北师范大学2013年度青年教师科研能力提升计划(NWNU-LKQN-12-23) (NWNU-LKQN-12-23)
中国科学院计算技术研究所智能信息处理重点实验室开放基金(IIP2014-4) (IIP2014-4)
2016年甘肃省大学生创新创业训练计划(201610736041,201610736040) (201610736041,201610736040)