| 注册
首页|期刊导航|东南大学学报(自然科学版)|二次剪枝算法在评论特征提取中的应用

二次剪枝算法在评论特征提取中的应用

吴含前 周立凤 谢珏

东南大学学报(自然科学版)2016,Vol.18Issue(3):513-517,5.
东南大学学报(自然科学版)2016,Vol.18Issue(3):513-517,5.DOI:10.3969/j.issn.1001-0505.2016.03.010

二次剪枝算法在评论特征提取中的应用

Application of secondary pruning algorithm in commentary feature extraction

吴含前 1周立凤 1谢珏2

作者信息

  • 1. 东南大学计算机科学与工程学院,南京 211189
  • 2. 东南大学蒙纳士大学苏州联合研究生院,苏州 215123
  • 折叠

摘要

Abstract

Aiming at the low accuracy rate of the generalized sequence pattern (GSP)algorithm on product feature extraction from Chinese online reviews,a secondary pruning algorithm is proposed. In this algorithm,based on the candidate collection of the output of the GSP algorithm,the term pair co-occurrence weight (TPCW)is used as the threshold for further filtering to improve the accuracy rate.The customized tools are used to crawl the product Chinese reviews of cameras from Jingdong website.1 000 reviews are selected as the experimental data and the segmentation tool ICTCLAS is used on the word segmentation and data preprocessing.The proposed algorithm is compared with the GSP algorithm,the cross language model (CLM),and the likelihood ratio test (LRT).The results show that the accuracy rate of the proposed algorithm on product feature extraction from Chinese on-line reviews is 76.37%,which is higher than those of the GSP algorithm,CLM and LRT by 2.94%,5.77% and 7.57%,respectively.

关键词

特征提取/二次剪枝/词对共现度/似然比检验/交叉语言模型

Key words

feature extraction/secondary pruning/term pair co-occurrence weight/likelihood ratio test/cross language model

分类

信息技术与安全科学

引用本文复制引用

吴含前,周立凤,谢珏..二次剪枝算法在评论特征提取中的应用[J].东南大学学报(自然科学版),2016,18(3):513-517,5.

基金项目

上海市一流学科(法学)建设项目成果暨上海市哲学社会科学规划项目“刑法修订与刑法解释关系问题研究”(2014JG009-BFX378)成果之一。 ()

东南大学学报(自然科学版)

OA北大核心CSCDCSTPCD

1001-0505

访问量0
|
下载量0
段落导航相关论文