计算机应用与软件2016,Vol.33Issue(10):28-31,56,5.DOI:10.3969/j.issn.1000-386x.2016.10.007
一种融合词项关联关系和统计信息的短文本建模方法
A SHORT TEXT MODELLING METHOD FUSING CORRELATION OF LEXICAL ITEMS AND STATISTIC INFORMATION
摘要
Abstract
Traditional text representation methods are usually based on the model of bag of words,while this model is based on the assumption that the lexical items are independent each other in the text.Recently the statistical analysis methods are also presented which obtain the relations between lexical items by word co-occurrences,but ignore the implied semantics between lexical items.In order to overcome the neglecting problem of the bag of words model of traditional text representation methods on text semantics,this paper presents a short texts modelling method which fuses the lexical items correlation and the statistics information.It obtains terms correlation through coupling the intra-relation and inter-relation between terms,which fully investigates the explicit and implied semantic information;meanwhile it employs the correlation as the initial terms similarity,and iteratively calculates the similarities between terms and texts,thus improves the representation of the short text.Experiments show that this method significantly improves the performance of short text clustering.关键词
内联关系/外联关系/词语相似度/文本相似度/短文本相似度Key words
Intra-relation/Inter-relation/Term similarity/Text similarity/Short text similarity分类
信息技术与安全科学引用本文复制引用
马慧芳,曾宪桃,李晓红,贠宁..一种融合词项关联关系和统计信息的短文本建模方法[J].计算机应用与软件,2016,33(10):28-31,56,5.基金项目
国家自然科学基金项目(61363058,61163039);甘肃省自然科学基金青年科技基金项目(145RJZA232);中国科学院计算技术研究所智能信息处理重点实验室开放基金项目(IIP2014-4)。 ()