基于空间投影和聚类划分的SVR加速算法OACSTPCD
An Accelerator for SVR Algorithms Based on Spatial Projection and Clustering Partitioning
数据不仅能产生价值,还对统计学的科学发展提供了动力.随着科技的飞速发展,海量数据得以涌现,但大规模的数据会导致很多传统处理方法很难满足各领域对数据分析的需求.面对海量数据时代学习算法的低效性,分治法通常被认为是解决这一问题最直接、最广泛使用的策略.SVR是一种强大的回归算法,在模式识别和数据挖掘等领域有广泛应用.然而在处理大规模数据时,SVR训练效率低.为此,该文利用分治思想提出一种基于空间投影和聚类划分的SVR加速算法(PKM-SVR).利用投影向量将数据投影到二维空间;利用聚类方法将数据空间划分为k个互不相交的区域;在每个区域上训练SVR模型;利用每个区域的SVR模型预测落入同一区域的待识别样本.在标准数据集上与传统的数据划分方法进行对比实验,实验结果表明该算法训练速度较快,并表现出更好的预测性能.
Data not only generates value,but also provides the impetus for the scientific development of statistics.With the rapid development of science and technology,massive data has emerged,but the large-scale data makes it difficult for many traditional processing methods to meet the needs of data analysis in various fields.Facing the inefficiency of learning algorithms in the era of massive data,partitioning is usually considered as the most direct and widely used strategy to solve this problem.SVR is a powerful regression algorithm with wide applications in the fields of pattern recognition and data mining.However,SVR is inefficient in training when dealing with large-scale data.For this reason,we propose a SVR acceleration algorithm based on spatial projection and clustering division(PKM-SVR)by utilizing the idea of partitioning.The projection vector is used to project the data into a two-dimensional space;the clustering method is used to divide the data space into k disjoint regions;the SVR model is trained on each region;and the SVR model in each region is used to predict the to-be-recognized samples that fall into the same region.Comparison experiments are conducted with the traditional data partitioning method on standard datasets,and the experimental results show that the proposed algorithm is faster to train and exhibits better prediction performance.
王梅;张天时;王志宝;任怡果
东北石油大学 计算机科学与信息技术学院,黑龙江 大庆 163318||黑龙江省石油大数据与智能分析重点实验室(东北石油大学),黑龙江 大庆 163318东北石油大学 计算机科学与信息技术学院,黑龙江 大庆 163318
计算机与自动化
大规模数据分治法支持向量回归主成分分析聚类
large-scale datadivide and rule methodsupport vector regressionprincipal components analysisclustering
《计算机技术与发展》 2024 (004)
24-29 / 6
国家自然科学基金项目(51774090);黑龙江省博士后科研启动金资助项目(LBH-Q20080)
评论