计算机技术与发展2016,Vol.26Issue(11):41-44,4.DOI:10.3969/j.issn.1673-629X.2016.11.009
一种Hadoop小文件存储优化方案
A Small Hadoop File Storage Optimization Scheme
摘要
Abstract
The performance of Hadoop Distributed File System ( HDFS) in dealing with the problem of storing and handling large files is excellent,but the performance and efficiency is down significantly when dealing with a huge number of small files. Too many small files will lead to the high load of entire cluster. In order to improve the performance of HDFS handling small files,the double-merging algo-rithm is put forward based on the relationship between files and merging algorithm based on the data block balance and used for uniform distribution of file size for small file. The program can further improve the effect of small file merging,decreasing the memory spending of HDFS cluster master node,reducing workload,and effectively declining the number of data blocks combined. Eventually the perform-ance of HDFS is improved when dealing with large small files.关键词
Hadoop分布式文件系统/小文件/合并算法/文件关联Key words
HDFS/small files/merge algorithm/file connection分类
信息技术与安全科学引用本文复制引用
王全民,张程,赵小桐,雷佳伟..一种Hadoop小文件存储优化方案[J].计算机技术与发展,2016,26(11):41-44,4.基金项目
国家自然科学基金资助项目(61272500) (61272500)