| 注册
首页|期刊导航|计算机工程与科学|基于双语LDA的跨语言文本相似度计算方法研究

基于双语LDA的跨语言文本相似度计算方法研究

程蔚 线岩团 周兰江 余正涛 王红斌

计算机工程与科学2017,Vol.39Issue(5):978-983,6.
计算机工程与科学2017,Vol.39Issue(5):978-983,6.DOI:10.3969/j.issn.1007-130X.2017.05.024

基于双语LDA的跨语言文本相似度计算方法研究

A cross-lingual document similarity calculation method based on bilingual LDA

程蔚 1线岩团 2周兰江 1余正涛 2王红斌1

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,云南昆明650500
  • 2. 昆明理工大学智能信息处理重点实验室,云南昆明650500
  • 折叠

摘要

Abstract

Based on the idea of bilingual topic model,we analyze similarity of bilingual documents and propose a cross-lingual document similarity calculation method based on bilingual LDA.Firstly we use the bilingual parallel documents to train the bilingual LDA model and then use the trained model to predict the topic distribution of the new corpus.The new corpus's bilingual documents are mapped to the vector space of the same topic.We use the cosine similarity method and topic distribution combined to calculate the similarity of the bilingual documents of the new corpus.We improve the topic frequency inverse document frequency method from the aspect of the dispersion of in-category and the between-category topic distribution,and utilize the improved method to calculate feature topic weights.Experimental results show that the improved weight calculation method can enhance the recall rate,enable the LDA similarity calculation algorithm not limited to certain categories,and it is reliable.

关键词

双语LDA/跨语言文本相似度/余弦相似度/主题频率-逆文档频率

Key words

bilingual LDA/cross-lingual document similarity calculation/cosine similarity/topic frequency-inverse document frequency

分类

信息技术与安全科学

引用本文复制引用

程蔚,线岩团,周兰江,余正涛,王红斌..基于双语LDA的跨语言文本相似度计算方法研究[J].计算机工程与科学,2017,39(5):978-983,6.

基金项目

国家自然科学基金(61363044,61462054) (61363044,61462054)

云南省科技厅面上项目(2015FB135) (2015FB135)

云南省教育厅科学研究基金(2014Z021) (2014Z021)

昆明理工大学省级人培项目(KKSY201403028). (KKSY201403028)

计算机工程与科学

OA北大核心CSCDCSTPCD

1007-130X

访问量3
|
下载量0
段落导航相关论文