数据采集与处理2018,Vol.33Issue(1):65-74,10.DOI:10.16337/j.1004-9037.2018.01.008
基于Spark的高效并行自动编码机
Efficient Parallel Auto-encoder Based on Spark
摘要
Abstract
How to find the good representation from raw data is a key and very important issue in machine learning.Most traditional approaches are based on the relationship among data or utilize simple linear combination,in which deep learning algorithm can perform very well in various machine learning tasks and achieve very good representations.However,most existing algorithms are implemented in serial, which cannot handle large-scale data.This paper proposes an effective parallel auto-encoder(PAE)based on Spark.The proposed PAE not only can learn satisfying representation,but also can speed up the exe-cuting time based on Spark.And then the paper adapts PAE to deal with the sparse data.Experiments conducted on two tasks,i.e.,classification and collaborative filtering,demonstrate the effectiveness and efficiency of the proposed PAE.关键词
自动编码机/Spark/机器学习/深度学习/特征学习Key words
auto-encoder/Spark/machine learning/deep learning/feature learning分类
信息技术与安全科学引用本文复制引用
庄福振,钱明达,申恩兆,张大鹏,何清..基于Spark的高效并行自动编码机[J].数据采集与处理,2018,33(1):65-74,10.基金项目
国家重点研发计划(2017YFB1002104)资助项目 (2017YFB1002104)
国家自然科学基金(61773361,61473273,91546122,61573335,61602438)资助项目 (61773361,61473273,91546122,61573335,61602438)
广东省省级科技计划(2015B010109005)资助项目. (2015B010109005)