首页|期刊导航|聊城大学学报（自然科学版）|基于Hadoop的海量统计小文件存取优化方案

基于Hadoop的海量统计小文件存取优化方案

付红阁姜华张怀锋

聊城大学学报（自然科学版）2016，Vol.29Issue(1)：102-106,5.

基于Hadoop的海量统计小文件存取优化方案

Accessing Optimization of Massive SmaII StatisticaI FiIes based on Hadoop

付红阁 ¹姜华 ¹张怀锋²

作者信息

1. 聊城大学计算机学院，山东聊城 252059
2. 山东省统计局数据管理中心，山东济南 250014
折叠

摘要

Abstract

As an open‐source parallel computing framework , Hadoop provides a distributed file storage system HDFS .However ,when dealing with small files ,it will cause NameNode consumes too much memory storage and the accessing performance not ideal ,so NameNode become a bottleneck ,w hich restrictes the file system scalability .Based on the statistical work ,we put forward the optimization strat‐egy for small files ,adding mall file preprocessing module on HDFS will classify the files and merge them into MapFile ,we also establish the global index ,in addition ,introduces index prefetching mechanism and caching mechanism .Experiments show that this method can effectively improve the performance of accessing mass small files .

关键词

HDFS/小文件/预处理模块/索引预取/缓存机制

Key words

HDFS/small file/preprocessing module/index mechanism/caching mechanism

分类

信息技术与安全科学

引用本文复制引用

付红阁,姜华,张怀锋..基于Hadoop的海量统计小文件存取优化方案[J].聊城大学学报（自然科学版）,2016,29(1):102-106,5.

基金项目

山东省统计科研重点课题（K T 15076）；山东省高校智能信息处理与网络安全重点实验室（聊城大学）；聊城大学科研基金项目资助（）

聊城大学学报（自然科学版）

OACHSSCD

ISSN：1672-6634

访问量0

下载量0

段落导航