统计与决策2018,Vol.34Issue(16):74-76,3.DOI:10.13546/j.cnki.tjyjc.2018.16.018
一种面向海量数据的spilt-and-conquer方法
A Spilt-and-Conquer Method for Massive Data
温焜 1兰晓然2
作者信息
- 1. 南昌大学 管理学院,南昌 330029
- 2. 江西行政学院,南昌 330003
- 折叠
摘要
Abstract
Lasso has been widely applied as one good method for variable selection. But for the high-dimensional massive data sets, there will be too much computer consumption. In view of this situation, this paper proposes the spilt-and-conquer method, in which the high-dimensional data sets are divided into K parts, and then variables are selected to merge each selected feature set before selecting variables. In order to verify the superiority of the proposed method, the paper uses six sets of data for experiments. Finally, the paper employs SVM, random forest and neural network to make a prediction, which shows that the spilt-and-conquer method has good performance in processing high dimensional mass data and also saves running time to a great extent.关键词
spilt-and-conquer方法/变量选择/高维数据Key words
spilt-and-conquer method/variable selection/high-dimensional data分类
数理科学引用本文复制引用
温焜,兰晓然..一种面向海量数据的spilt-and-conquer方法[J].统计与决策,2018,34(16):74-76,3.