| 注册
首页|期刊导航|数据采集与处理|基于MapReduce框架的分布式软K段主曲线算法

基于MapReduce框架的分布式软K段主曲线算法

胡作梁 张红云

数据采集与处理2017,Vol.32Issue(3):507-515,9.
数据采集与处理2017,Vol.32Issue(3):507-515,9.DOI:10.16337/j.1004-9037.2017.03.009

基于MapReduce框架的分布式软K段主曲线算法

Distributed Soft K-Segments Algorithm for Principal Curves Based on MapReduce

胡作梁 1张红云2

作者信息

  • 1. 同济大学计算机科学与技术系,上海,201804
  • 2. 同济大学嵌入式系统与服务计算教育部重点实验室,上海,201804
  • 折叠

摘要

Abstract

The traditional principal curves algorithm can obtain good results on small datasets.But the computing and storage resources of a single node cannot meet the requirements of the extraction of principal curves on massive datasets.Distributed parallel computing is one of the most effective way to solve the problems.Therefore,we proposed a distributed soft K-segments algorithm for principal curves based on MapReduce,named DisSKPC.First,we recursively granulated all the numerical data into information granules to limit each granular size and ensure the relevance of the data in the granules using the distributed K-Means algorithm.Then we calculated the local principal component segments of each granule and eliminated over-fitting segments that may arise in the area of high-density and high-curvature using the noise variance.Finally,we connected these local principal component segments using the Hamiltonian path and greedy algorithm,forming a best curve through the middle of the data cloud.Experimental resuits demonstrate the feasibility and scalability of the proposed DisSKPC algorithm.

关键词

分布式并行化/主曲线/数据粒化/MapReduce

Key words

distributed parallel/principal curves/data granulation/MapReduce

分类

信息技术与安全科学

引用本文复制引用

胡作梁,张红云..基于MapReduce框架的分布式软K段主曲线算法[J].数据采集与处理,2017,32(3):507-515,9.

基金项目

国家自然科学基金(61573255)资助项目 (61573255)

上海市进一步加快中医药事业发展三年行动计划重点(ZY3CCCX36002)资助项目 (ZY3CCCX36002)

中央高校基本科研业务费(0800219302)资助项目 (0800219302)

上海市自然科学基金(14ZR1442600)资助项目. (14ZR1442600)

数据采集与处理

OA北大核心CSCDCSTPCD

1004-9037

访问量0
|
下载量0
段落导航相关论文