管理工程学报2018,Vol.32Issue(3):126-133,8.DOI:10.13587/j.cnki.jieem.2018.03.015
基于IDSSL的文本情感分析研究
Study of text sentiment analysis based on IDSSL
摘要
Abstract
With the growing popularity of social media, a large number of user generated content is posted on the Internet. These kinds of texts contain user's points of view, opinions and attitudes, which play an important role for Internet users. Researchers pay increased attention to user-generated content. Subsequently, a lot of supervised text sentiment analysis methods have been proposed to make use of this kind of data. However, there are a lot of unlabeled data in the sentiment analysis. How to use a large number of unlabeled data and a small amount of labeled data has become one of the urgent research problems in the area of sentiment analysis. Therefore, this paper proposed an Improved Disagreement-based Semi-Supervised Learning (IDSSL) method for text sentiment analysis, which is based on the framework of disagreement-based semi-supervised learning. Firstly, a model for sentiment analysis based on the disagreement-based semi-supervised learning was constructed. First of all, the disagreement-based semi-supervised learning was theoretically analyzed. The analysis found that the multiple-classifiers method is better than original disagreement-based semi-supervised learning method. On the other hand, diversity is the key value of the multiple-classifier disagreement-based semi-supervised learning method. Moreover, Random Subspace method can lead to diversity of the classifiers in the area of sentiment analysis. Therefore, we constructed a sentiment analysis model by combining multiple classifiers method produced with Random Subspace method, namely IDSSL method. IDSSL method consists of three steps: (1) multiple initial classifiers are built based on the Random Subspace method; (2) classifiers are trained by the rule of "majority help minority" to utilize the unlabeled instances; and (3) the base classifier was integrated in majority vote. Secondly, experiments were carried out using the classic datasets of sentiment analysis. The established standard measure in sentiment analysis was adopted to evaluate the performance of the proposed method. IDSSL method is compared with several disagreement-based semi-supervised learning method, including Self-training method, Co-training method, Tri-training method and Co-forest method. Self-training, Co-training, Tri-training, and IDSSL used SVM as base learner. To minimize the influence of variability in the training set, the 10-fold cross validation was performed five times on the sentiment analysis datasets. Finally, experimental results proved the effectiveness of our proposed method. Moreover, our proposed method obtained better results than the other semi-supervised learning methods, including Self-training method, Co-training method, Tri-training method, and Co-forest method. In addition, we also discuss different semi-supervised learning methods’ results, the influence of the label rate on semi-supervised learning methods, and the influence of the add-number on the IDSSL method.关键词
文本情感分析/半监督学习/多分类器/Random SubspaceKey words
Text sentiment analysis/Semi-supervised learning/Multi classifiers/Random subspace分类
信息技术与安全科学引用本文复制引用
王刚,李宁宁,杨善林..基于IDSSL的文本情感分析研究[J].管理工程学报,2018,32(3):126-133,8.基金项目
国家自然科学基金资助项目(71471054、91646111) (71471054、91646111)
安徽省自然科学基金资助项目(1608085MG150) (1608085MG150)