电子科技大学学报Issue(1):141-145,5.DOI:10.3969/j.issn.1001-0548.2016.01.024
基于Hadoop的小文件存储优化方案
Storage Optimization Method of Small Files Based on Hadoop
摘要
Abstract
Hadoop distributes file system (HDFS) can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. An approach based on HDFS is proposed to improve storage efficiency of small files in HDFS. The main idea is to classify the mass small files, merge them by classes, and index the merged files aiming at reducing the amount of index items in namenodes and improving the storage efficiency. Experimental results show that the storage efficiency of small files is improved contrasting to Hadoop Archives (HAR files).关键词
Hadoop/索引机制/关联关系/小文件存储Key words
Hadoop/index mechanism/relationship/storage of small files分类
信息技术与安全科学引用本文复制引用
李孟,曹晟,秦志光..基于Hadoop的小文件存储优化方案[J].电子科技大学学报,2016,(1):141-145,5.基金项目
教育部-中国移动科研基金(MCM20121041);国家自然科学基金(61133016,61103206);国家863计划(2011AA010706) (MCM20121041)