聊城大学学报(自然科学版)2016,Vol.29Issue(1):102-106,5.
基于Hadoop的海量统计小文件存取优化方案
Accessing Optimization of Massive SmaII StatisticaI FiIes based on Hadoop
摘要
Abstract
As an open‐source parallel computing framework , Hadoop provides a distributed file storage system HDFS .However ,when dealing with small files ,it will cause NameNode consumes too much memory storage and the accessing performance not ideal ,so NameNode become a bottleneck ,w hich restrictes the file system scalability .Based on the statistical work ,we put forward the optimization strat‐egy for small files ,adding mall file preprocessing module on HDFS will classify the files and merge them into MapFile ,we also establish the global index ,in addition ,introduces index prefetching mechanism and caching mechanism .Experiments show that this method can effectively improve the performance of accessing mass small files .关键词
HDFS/小文件/预处理模块/索引预取/缓存机制Key words
HDFS/small file/preprocessing module/index mechanism/caching mechanism分类
计算机与自动化引用本文复制引用
付红阁,姜华,张怀锋..基于Hadoop的海量统计小文件存取优化方案[J].聊城大学学报(自然科学版),2016,29(1):102-106,5.基金项目
山东省统计科研重点课题(K T 15076);山东省高校智能信息处理与网络安全重点实验室(聊城大学);聊城大学科研基金项目资助 ()