首页|期刊导航|计算机应用与软件|一种融合词项关联关系和统计信息的短文本建模方法

一种融合词项关联关系和统计信息的短文本建模方法

马慧芳曾宪桃李晓红贠宁

计算机应用与软件2016，Vol.33Issue(10)：28-31,56,5.

计算机应用与软件2016，Vol.33Issue(10)：28-31,56,5.DOI:10.3969/j.issn.1000-386x.2016.10.007

一种融合词项关联关系和统计信息的短文本建模方法

A SHORT TEXT MODELLING METHOD FUSING CORRELATION OF LEXICAL ITEMS AND STATISTIC INFORMATION

马慧芳 ¹曾宪桃 ¹李晓红 ¹贠宁¹

作者信息

1. 西北师范大学计算机科学与工程学院甘肃兰州 730070
折叠

摘要

Abstract

Traditional text representation methods are usually based on the model of bag of words,while this model is based on the assumption that the lexical items are independent each other in the text.Recently the statistical analysis methods are also presented which obtain the relations between lexical items by word co-occurrences,but ignore the implied semantics between lexical items.In order to overcome the neglecting problem of the bag of words model of traditional text representation methods on text semantics,this paper presents a short texts modelling method which fuses the lexical items correlation and the statistics information.It obtains terms correlation through coupling the intra-relation and inter-relation between terms,which fully investigates the explicit and implied semantic information;meanwhile it employs the correlation as the initial terms similarity,and iteratively calculates the similarities between terms and texts,thus improves the representation of the short text.Experiments show that this method significantly improves the performance of short text clustering.

关键词

内联关系/外联关系/词语相似度/文本相似度/短文本相似度

Key words

Intra-relation/Inter-relation/Term similarity/Text similarity/Short text similarity

分类

信息技术与安全科学

引用本文复制引用

马慧芳,曾宪桃,李晓红,贠宁..一种融合词项关联关系和统计信息的短文本建模方法[J].计算机应用与软件,2016,33(10):28-31,56,5.

基金项目

国家自然科学基金项目（61363058，61163039）；甘肃省自然科学基金青年科技基金项目（145RJZA232）；中国科学院计算技术研究所智能信息处理重点实验室开放基金项目（IIP2014－4）。（）

计算机应用与软件

OACSTPCD

ISSN：1000-386X

访问量0

下载量0

段落导航