| 注册
首页|期刊导航|计算机工程与应用|基于HDP的监督多标签文本分类研究

基于HDP的监督多标签文本分类研究

谢晨阳 卢焱鑫

计算机工程与应用2017,Vol.53Issue(23):18-23,46,7.
计算机工程与应用2017,Vol.53Issue(23):18-23,46,7.DOI:10.3778/j.issn.1002-8331.1709-0162

基于HDP的监督多标签文本分类研究

Supervise multi-label text classification based on hierarchical dirichlet process

谢晨阳 1卢焱鑫2

作者信息

  • 1. 武汉大学 计算机学院,武汉 430000
  • 2. 武汉大学 软件工程国家重点实验室,武汉 430000
  • 折叠

摘要

Abstract

With the development of Internet and information technology, a large number of multi-label texts data quickly generated. In the text classification, how to determine the appropriate number of categories and how to identify the label of the textmore accurately is an urgent problem to be solved. The HL_LDA model proposed in this paper automatically determines the number of categories through the hierarchical Dirichlet process, and improves the quality of the classification by discovering the hierarchical information between labels of multi-label documents. The experimental results show that the evaluation of HL_LDA is superior to the existing method in precision and F1-score compared with the LDA-based and SVM-based methods on different types of data sets.

关键词

多标签/文本分类/标签依赖/层次狄利克雷过程

Key words

multi-label/text clustering/tag dependence/hierarchical Dirichlet process

分类

信息技术与安全科学

引用本文复制引用

谢晨阳,卢焱鑫..基于HDP的监督多标签文本分类研究[J].计算机工程与应用,2017,53(23):18-23,46,7.

基金项目

青年科学基金项目(No.60903035) (No.60903035)

国家自然科学基金(No.61572373) (No.61572373)

国家重点研发计划(No.2017YFC0803808). (No.2017YFC0803808)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文