管理工程学报2017,Vol.31Issue(4):52-62,11.DOI:10.13587/j.cnki.jieem.2017.04.007
中文在线评论的产品特征与观点识别:跨领域的比较研究
Extracting product features and opinions from Chinese online reviews: A comparative study on multi-domains
摘要
Abstract
Extraction feature and opinion is the basis of fine-grained sentiment analysis.Prior algorithms fail to be applicable for different areas,so the problem of robustness and migration for different fields are of concern to these algorithms.A couple of algorithms for feature mining have been proposed by antecedent researchers.Generally,there are two common techniques used by feature extraction:statistics based methods and machine learning based methods.However,no final conclusion has yet been drawn on the matter of feature extraction.Robustness and portability are still the main issues for current algorithms.One of the reasons is lack of systematic comparison on a unified corpus.In addition,past extraction algorithms are mostly implemented in the context of English,while lacking enough attentions on Chinese online reviews.Due to the syntactic differences between languages,the English-based algorithms cannot be directly applicable in the Chinese context.We thus choose six widely-used extraction algorithms and compare the performance between the statistical methods and machine learning methods for feature-opinion mining in Chinese context.The selected algorithms include Frequency-based opinion mining,Rule-based opinion mining,Association rule-based opinion mining,Association rule-based opinion mining plus linguistic,CRFs-based opinion mining and SVM-based opinion mining.We collect 3146 reviews as experimental corpus from 7 different fields:digital camera reviews,cosmetics reviews,book reviews,hotel reviews,critics,cell phone reviews and restaurant reviews.Finally,these corpuses are employed respectively by the six algorithms above to extract features for testing extraction performances.Experiment obtains the following conclusions:(1) It can achieve the best performance for frequency-based opinion mining when the threshold is set to 0.5%,which is quite different from English context (1%);(2) there is no algorithm which can dominate in all corpuses.Any algorithm can achieve good performance in limited areas;(3) machine learning algorithms generally outperform statistical approaches.In some corpus (e.g.mobile phone reviews),however,statistical methods can perform better,thus guiding us to select an appropriate algorithm according to the corpus in the follow-up research and application;(4) the length of reviews can affect the performance of mining algorithms.A longer text will lead a poorer accuracy,and vice versa;(5) due to syntactic difference between languages,both the association rule-based opinion mining and the association rule-based opinion mining plus linguistic perform poorly in Chinese context,unlike their excellence in English context.It also implies the complexity of Chinese natural language processing;(6) for the same algorithm,experimental results can be better in dealing with service domains (e.g.restaurants,hotels),but much poor in dealing with arts and entertainment area (e.g.film,book).It indicates the differences between domains in problem solving of feature extraction.关键词
在线评论/中文语境/产品特征/观点识别/情感分析Key words
Online review/Chinese context/Product feature/Opinion extraction/Sentiment analysis分类
信息技术与安全科学引用本文复制引用
王伟,王洪伟,盛小宝..中文在线评论的产品特征与观点识别:跨领域的比较研究[J].管理工程学报,2017,31(4):52-62,11.基金项目
国家自然科学基金资助项目(70971099、71371144、71402121) (70971099、71371144、71402121)
上海市哲学社会科学规划课题一般项目(2013BGL004). (2013BGL004)