| 注册
首页|期刊导航|计算机应用研究|基于集成学习的标题分类算法研究

基于集成学习的标题分类算法研究

高元 刘柏嵩

计算机应用研究2017,Vol.34Issue(4):1004-1007,4.
计算机应用研究2017,Vol.34Issue(4):1004-1007,4.DOI:10.3969/j.issn.1001-3695.2017.04.010

基于集成学习的标题分类算法研究

Headlines classification method based on ensemble learning

高元 1刘柏嵩1

作者信息

  • 1. 宁波大学信息科学与工程学院,浙江宁波315211
  • 折叠

摘要

Abstract

The headlines classification is to classify for a headline statement which is not more than 20 words but is concise and summary.This paper proposed a headlines classification method based on improved random forest,which introduced Bayes polynomial model into the process of building underlying classifier,to solve the poor classification performance causing by feature fewer and uncertainty of headlines text.Meanwhile,it proposed a two-dimensional weighted voting mechanism using the out-of-bag data of random forest.Last,it conducted the experiment with the real data of library and compared with the SVM algorithm which was based on LDA theme extensions.The experimental results show that this approach has a stable performance and presents a better result under a certain conditions.

关键词

自然语言处理/标题分类/集成学习/改进随机森林/OOB二维权重分布

Key words

natural language processing/headlines classification/ensemble learning/improved random forest/OOB two-dimensional weight distribution

分类

信息技术与安全科学

引用本文复制引用

高元,刘柏嵩..基于集成学习的标题分类算法研究[J].计算机应用研究,2017,34(4):1004-1007,4.

基金项目

国家社会科学基金资助项目(15FTQ002) (15FTQ002)

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文