首页|期刊导航|数码设计|基于数据倾斜的关联查询优化方法

基于数据倾斜的关联查询优化方法

郭开卫王颖卓王亚雄

数码设计Issue(3)：72-74,3.

基于数据倾斜的关联查询优化方法

Relational Query Optimization Method Based on Data Skew

郭开卫 ¹王颖卓 ¹王亚雄¹

作者信息

1. 中国银联,上海 201201
折叠

摘要

Abstract

In the process of big data distributed batch data processing,data skew is often encountered by big data developers.Data skew is a"long tail"phenomenon caused by uneven distribution of data,resulting in large differences in data processed by each data node.This is common in distributed data processing systems,and the main reason for this phenomenon is the uneven distribution of key values in the data.In parallel computing,a large number of the same keys are allocated to a single host for processing,resulting in the situation of"busy single machine and idle cluster",which violates the original intention and design principles of parallel computing,resulting in the overall efficiency of parallel computing,and even memory overflow.

关键词

大数据/分布式/批量数据/数据倾斜

Key words

big data/distributed/batch data/data skew

分类

信息技术与安全科学

引用本文复制引用

郭开卫,王颖卓,王亚雄..基于数据倾斜的关联查询优化方法[J].数码设计,2024,(3):72-74,3.

数码设计

ISSN：1672-9129

访问量5

下载量0

段落导航