| 注册
首页|期刊导航|数据采集与处理|基于Spark的高效并行自动编码机

基于Spark的高效并行自动编码机

庄福振 钱明达 申恩兆 张大鹏 何清

数据采集与处理2018,Vol.33Issue(1):65-74,10.
数据采集与处理2018,Vol.33Issue(1):65-74,10.DOI:10.16337/j.1004-9037.2018.01.008

基于Spark的高效并行自动编码机

Efficient Parallel Auto-encoder Based on Spark

庄福振 1钱明达 1申恩兆 1张大鹏 2何清2

作者信息

  • 1. 中国科学院计算技术研究所智能信息处理重点实验室,北京,100190
  • 2. 燕山大学信息科学与工程学院,秦皇岛, 066004
  • 折叠

摘要

Abstract

How to find the good representation from raw data is a key and very important issue in machine learning.Most traditional approaches are based on the relationship among data or utilize simple linear combination,in which deep learning algorithm can perform very well in various machine learning tasks and achieve very good representations.However,most existing algorithms are implemented in serial, which cannot handle large-scale data.This paper proposes an effective parallel auto-encoder(PAE)based on Spark.The proposed PAE not only can learn satisfying representation,but also can speed up the exe-cuting time based on Spark.And then the paper adapts PAE to deal with the sparse data.Experiments conducted on two tasks,i.e.,classification and collaborative filtering,demonstrate the effectiveness and efficiency of the proposed PAE.

关键词

自动编码机/Spark/机器学习/深度学习/特征学习

Key words

auto-encoder/Spark/machine learning/deep learning/feature learning

分类

信息技术与安全科学

引用本文复制引用

庄福振,钱明达,申恩兆,张大鹏,何清..基于Spark的高效并行自动编码机[J].数据采集与处理,2018,33(1):65-74,10.

基金项目

国家重点研发计划(2017YFB1002104)资助项目 (2017YFB1002104)

国家自然科学基金(61773361,61473273,91546122,61573335,61602438)资助项目 (61773361,61473273,91546122,61573335,61602438)

广东省省级科技计划(2015B010109005)资助项目. (2015B010109005)

数据采集与处理

OA北大核心CSCDCSTPCD

1004-9037

访问量0
|
下载量0
段落导航相关论文