| 注册
首页|期刊导航|管理工程学报|基于IDSSL的文本情感分析研究

基于IDSSL的文本情感分析研究

王刚 李宁宁 杨善林

管理工程学报2018,Vol.32Issue(3):126-133,8.
管理工程学报2018,Vol.32Issue(3):126-133,8.DOI:10.13587/j.cnki.jieem.2018.03.015

基于IDSSL的文本情感分析研究

Study of text sentiment analysis based on IDSSL

王刚 1李宁宁 2杨善林1

作者信息

  • 1. 合肥工业大学 管理学院,安徽 合肥230009
  • 2. 过程优化与智能决策教育部重点实验室,安徽 合肥 230009
  • 折叠

摘要

Abstract

With the growing popularity of social media, a large number of user generated content is posted on the Internet. These kinds of texts contain user's points of view, opinions and attitudes, which play an important role for Internet users. Researchers pay increased attention to user-generated content. Subsequently, a lot of supervised text sentiment analysis methods have been proposed to make use of this kind of data. However, there are a lot of unlabeled data in the sentiment analysis. How to use a large number of unlabeled data and a small amount of labeled data has become one of the urgent research problems in the area of sentiment analysis. Therefore, this paper proposed an Improved Disagreement-based Semi-Supervised Learning (IDSSL) method for text sentiment analysis, which is based on the framework of disagreement-based semi-supervised learning. Firstly, a model for sentiment analysis based on the disagreement-based semi-supervised learning was constructed. First of all, the disagreement-based semi-supervised learning was theoretically analyzed. The analysis found that the multiple-classifiers method is better than original disagreement-based semi-supervised learning method. On the other hand, diversity is the key value of the multiple-classifier disagreement-based semi-supervised learning method. Moreover, Random Subspace method can lead to diversity of the classifiers in the area of sentiment analysis. Therefore, we constructed a sentiment analysis model by combining multiple classifiers method produced with Random Subspace method, namely IDSSL method. IDSSL method consists of three steps: (1) multiple initial classifiers are built based on the Random Subspace method; (2) classifiers are trained by the rule of "majority help minority" to utilize the unlabeled instances; and (3) the base classifier was integrated in majority vote. Secondly, experiments were carried out using the classic datasets of sentiment analysis. The established standard measure in sentiment analysis was adopted to evaluate the performance of the proposed method. IDSSL method is compared with several disagreement-based semi-supervised learning method, including Self-training method, Co-training method, Tri-training method and Co-forest method. Self-training, Co-training, Tri-training, and IDSSL used SVM as base learner. To minimize the influence of variability in the training set, the 10-fold cross validation was performed five times on the sentiment analysis datasets. Finally, experimental results proved the effectiveness of our proposed method. Moreover, our proposed method obtained better results than the other semi-supervised learning methods, including Self-training method, Co-training method, Tri-training method, and Co-forest method. In addition, we also discuss different semi-supervised learning methods’ results, the influence of the label rate on semi-supervised learning methods, and the influence of the add-number on the IDSSL method.

关键词

文本情感分析/半监督学习/多分类器/Random Subspace

Key words

Text sentiment analysis/Semi-supervised learning/Multi classifiers/Random subspace

分类

信息技术与安全科学

引用本文复制引用

王刚,李宁宁,杨善林..基于IDSSL的文本情感分析研究[J].管理工程学报,2018,32(3):126-133,8.

基金项目

国家自然科学基金资助项目(71471054、91646111) (71471054、91646111)

安徽省自然科学基金资助项目(1608085MG150) (1608085MG150)

管理工程学报

OA北大核心CHSSCDCSCDCSSCICSTPCD

1004-6062

访问量0
|
下载量0
段落导航相关论文