东南大学学报(自然科学版)2016,Vol.18Issue(3):513-517,5.DOI:10.3969/j.issn.1001-0505.2016.03.010
二次剪枝算法在评论特征提取中的应用
Application of secondary pruning algorithm in commentary feature extraction
摘要
Abstract
Aiming at the low accuracy rate of the generalized sequence pattern (GSP)algorithm on product feature extraction from Chinese online reviews,a secondary pruning algorithm is proposed. In this algorithm,based on the candidate collection of the output of the GSP algorithm,the term pair co-occurrence weight (TPCW)is used as the threshold for further filtering to improve the accuracy rate.The customized tools are used to crawl the product Chinese reviews of cameras from Jingdong website.1 000 reviews are selected as the experimental data and the segmentation tool ICTCLAS is used on the word segmentation and data preprocessing.The proposed algorithm is compared with the GSP algorithm,the cross language model (CLM),and the likelihood ratio test (LRT).The results show that the accuracy rate of the proposed algorithm on product feature extraction from Chinese on-line reviews is 76.37%,which is higher than those of the GSP algorithm,CLM and LRT by 2.94%,5.77% and 7.57%,respectively.关键词
特征提取/二次剪枝/词对共现度/似然比检验/交叉语言模型Key words
feature extraction/secondary pruning/term pair co-occurrence weight/likelihood ratio test/cross language model分类
信息技术与安全科学引用本文复制引用
吴含前,周立凤,谢珏..二次剪枝算法在评论特征提取中的应用[J].东南大学学报(自然科学版),2016,18(3):513-517,5.基金项目
上海市一流学科(法学)建设项目成果暨上海市哲学社会科学规划项目“刑法修订与刑法解释关系问题研究”(2014JG009-BFX378)成果之一。 ()