| 注册
首页|期刊导航|计算机科学与探索|Spark平台下的短文本特征扩展与分类研究

Spark平台下的短文本特征扩展与分类研究

王雯 赵衎衎 李翠平 陈红 孙辉

计算机科学与探索2017,Vol.11Issue(5):732-741,10.
计算机科学与探索2017,Vol.11Issue(5):732-741,10.DOI:10.3778/j.issn.1673-9418.1608041

Spark平台下的短文本特征扩展与分类研究

Feature Extension and Category Research for Short Text Based on Spark Platform

王雯 1赵衎衎 2李翠平 1陈红 2孙辉1

作者信息

  • 1. 中国人民大学数据工程与知识工程教育部重点实验室,北京100872
  • 2. 中国人民大学信息学院,北京100872
  • 折叠

摘要

Abstract

Short text classification is often confronted with some limitations including high feature dimensions,sparse feature existences and poor classification accuracy,which can be solved by feature extension effectively.However,it decreases the execution efficiency greatly.To improve classification accuracy and efficiency of short text,this paper proposes a new solution,association rule based feature extension method which is designed on Spark platform.Given a background data set of short text corpus,firstly extend origin corpus and complement the features by mining the association rules and the corresponding confidences.Then apply a new cascade SVM (support vector machine) algorithm based on distance to choose during classification.Finally design the feature extension and classification algorithm of short text on Spark platform and improve the efficiency of short text processing through distributed algorithm.The experiments show that the new method gains 4 times of efficiency improvement compared with the traditional method and 15% increase in classification accuracy,in which the accuracy of feature extension and classification optimization is 10% and 5% respectively.

关键词

短文本分类/特征扩展/关联规则/Spark平台

Key words

short text classification/feature extension/association rule/Spark platform

分类

信息技术与安全科学

引用本文复制引用

王雯,赵衎衎,李翠平,陈红,孙辉..Spark平台下的短文本特征扩展与分类研究[J].计算机科学与探索,2017,11(5):732-741,10.

计算机科学与探索

OA北大核心CSCDCSTPCD

1673-9418

访问量0
|
下载量0
段落导航相关论文