| 注册
首页|期刊导航|计算机工程与应用|基于重叠度与完整度的LDA主题优选方法

基于重叠度与完整度的LDA主题优选方法

柏志安 曾剑平

计算机工程与应用2019,Vol.55Issue(12):155-161,7.
计算机工程与应用2019,Vol.55Issue(12):155-161,7.DOI:10.3778/j.issn.1002-8331.1803-0259

基于重叠度与完整度的LDA主题优选方法

Optimal Selection Method for LDA Topics Based on Degree of Overlap and Completeness

柏志安 1曾剑平2

作者信息

  • 1. 上海交通大学医学院附属瑞金医院 计算机中心,上海 200025
  • 2. 复旦大学 计算机科学技术学院,上海 200433
  • 折叠

摘要

Abstract

Many topic modeling methods can infer topic number and topic description from large text data set based on LDA, however, there exists several problems, such as determination of topic number, and selection of topic words. The paper proposes a new method to select optimal topic description based on Overlap-Completeness score. It combines LDA and TF-IDF, and takes completeness of words and word independency into consideration. Based on the result of LDA, TF-IDF is utilized to select distinctive words for each topic, then the degree of overlap between the vocabularies of different top-ics, and the degree of completeness in topic description are defined, and finally the optimal selection method is presented. The method can not only get the best topic number, but also the best description words for each topic. Experiments based on news about information security topic show that, compared with the traditional LDA model, this method can get dis-tinctive topics and representative words.

关键词

LDA模型/TF-IDF/主题识别/重叠度/完整度

Key words

LDA model/ TF-IDF/ topic detection/ degree of overlap/ degree of completeness

分类

信息技术与安全科学

引用本文复制引用

柏志安,曾剑平..基于重叠度与完整度的LDA主题优选方法[J].计算机工程与应用,2019,55(12):155-161,7.

基金项目

上海市自然科学基金(No.15ZR1403700). (No.15ZR1403700)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文