首页|期刊导航|计算机应用与软件|基于数据特性的Spark任务性能优化

基于数据特性的Spark任务性能优化

柴宁吴毅坚赵文耘

计算机应用与软件2018，Vol.35Issue(1)：52-58,84,8.

计算机应用与软件2018，Vol.35Issue(1)：52-58,84,8.DOI:10.3969/j.issn.1000-386x.2018.01.009

基于数据特性的Spark任务性能优化

OPTIMIZATION FOR SPARK MISSION PERFORMANCE BASED ON DATA CHARACTERISTICS

柴宁 ¹吴毅坚 ²赵文耘¹

作者信息

1. 复旦大学软件学院上海201203
2. 上海市数据科学重点实验室上海200433
折叠

摘要

Abstract

A new generation of distributed data processing framework greatly enhances the efficiency of data processing tasks.However,it is difficult to find a unified way to optimize the performance of data processing tasks due to the characteristics of different data.In order to exploit memory,computing resources and optimize the efficiency of task execution,we need to analyze the corresponding data characteristics.In this paper,we study the data characteristics of data skew,and propose a quantization method of data inclination.Based on the distributed processing framework Spark,we can automatically judge the data skew of the currently processed data set by combining data sampling analysis and source code semantic analysis,based on the results of the corresponding code to automatically optimize the program,so as to enhance the operational efficiency of the task.Through a number of data processing experiments to verify the efficiency of the method.

关键词

分布式系统/大数据/Spark/数据特性

Key words

Distributed system/Big data/Spark/Data characteristics

分类

信息技术与安全科学

引用本文复制引用

柴宁,吴毅坚,赵文耘..基于数据特性的Spark任务性能优化[J].计算机应用与软件,2018,35(1):52-58,84,8.

计算机应用与软件

OA北大核心CSTPCD

ISSN：1000-386X

访问量0

下载量0

段落导航