首页|期刊导航|计算机工程与应用|混合词汇特征和LDA的语义相关度计算方法

混合词汇特征和LDA的语义相关度计算方法

肖宝李璞蒋运承

计算机工程与应用2017，Vol.53Issue(12)：152-157,165,7.

计算机工程与应用2017，Vol.53Issue(12)：152-157,165,7.DOI:10.3778/j.issn.1002-8331.1606-0088

混合词汇特征和LDA的语义相关度计算方法

Combing lexical features and LDA for semantic relatedness measure

肖宝 ¹李璞 ²蒋运承³

作者信息

1. 钦州学院电子与信息工程学院,广西钦州 535011
2. 华南师范大学计算机学院,广州 510631
3. 郑州轻工业学院软件学院,郑州 450000
折叠

摘要

Abstract

Computing semantic relatedness in text documents is a key problem in many domains, for example, Natural Language Processing(NLP), Semantic Information Retrieval(SIR), etc. ESA(Explicit Semantic Analysis)for Wikipe-dia has received wide attention and applied mainly because of its simplicity and effectivity. However, use of ESA in semantic relatedness computation is inefficient due to its redundant concepts and high dimensionality. This paper presents a new technique based on LDA(Latent Dirichlet Allocation)and JSD(Jensen-Shannon Divergence)to computer semantic relatedness between text documents. The LDA is employed to reduce dimensionality and improve efficiency, and is used to build topic model probability vector from highly dimensional document matrix. Instead of cosine distance, JSD is used to compute semantic relatedness between documents. The results show that this technique based on LDA and JSD is more effective than ESA. Several benchmark test results have been presented to compare proposed technique with other meth-ods. The results of experiment show that the proposed technique provides an increase of above 3%and 9%in Pearson cor-relation coefficient than ESA and LDA, respectively.

关键词

主题模型/词汇特征/显式语义分析(ESA)/隐含狄利克雷分布(LDA)/语义相关度计算

Key words

topic model/lexical features/Explicit Semantic Analysis(ESA)/Latent Dirichlet Allocation(LDA)/semantic relatedness measure

分类

信息技术与安全科学

引用本文复制引用

肖宝,李璞,蒋运承..混合词汇特征和LDA的语义相关度计算方法[J].计算机工程与应用,2017,53(12):152-157,165,7.

基金项目

国家自然科学基金(No.61272066) （No.61272066）

广州市科技计划项目(No.2014J4100031) （No.2014J4100031）

广西高校中青年教师基础能力提升项目(No.KY2016LX431). （No.KY2016LX431）

计算机工程与应用

OA北大核心CSCDCSTPCD

ISSN：1002-8331

访问量0

下载量0

段落导航