| 注册
首页|期刊导航|计算机工程|基于内存与文件共享机制的Spark I/O性能优化

基于内存与文件共享机制的Spark I/O性能优化

黄廷辉 王玉良 汪振 崔更申

计算机工程2017,Vol.43Issue(3):1-6,6.
计算机工程2017,Vol.43Issue(3):1-6,6.DOI:10.3969/j.issn.1000-3428.2017.03.001

基于内存与文件共享机制的Spark I/O性能优化

Spark I/O Performance Optimization Based on Memory and File Sharing Mechanism

黄廷辉 1王玉良 1汪振 1崔更申1

作者信息

  • 1. 桂林电子科技大学 计算机与信息安全学院,广西 桂林 541004
  • 折叠

摘要

Abstract

Based on the analysis of the key technologies of Spark,such as flexible distributed data set and Spark task scheduling,it is concluded that time of I/O in data processing has a great effect on the computing performance of Spark.Aiming at this problem,this paper studies the run mode of Spark consolidating files that can reduce the number of cache files and improve the I/O efficiency of Spark to some extent,but it still has the disadvantage of high memory cost.Further more,the paper proposes an improved process of Spark Shuffle which designs a mode that every Mapper only generates one cache file,and every Mapper's bucket shares the same memory buffer,thus these improve I/O efficiency and reduce the memory overhead.Simulation results show that,compared with the default mode of Spark,the I/O time of a wide dependent process is shortened by 42.9%,which improves the memory utilization and the efficiency of the Spark platform.

关键词

分布式计算/Spark平台/Shuffle过程/磁盘I/O/任务调度

Key words

distributed computing/Spark platform/Shuffle process/disk I/O/task scheduling

分类

信息技术与安全科学

引用本文复制引用

黄廷辉,王玉良,汪振,崔更申..基于内存与文件共享机制的Spark I/O性能优化[J].计算机工程,2017,43(3):1-6,6.

基金项目

国家自然科学基金(61363029) (61363029)

赛尔网络下一代互联网技术创新计划项目(NGII20160306). (NGII20160306)

计算机工程

OA北大核心CSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文