计算机工程2017,Vol.43Issue(2):85-91,7.DOI:10.3969/j.issn.1000-3428.2017.02.015
基于数据路由的分布式备份数据去重系统
Distributed Backup Data Deduplication System Based on Data Routing
摘要
Abstract
In big data scenarios,traditional data deduplication backup system faces with defects like large data backup storage space,insufficient data throughput and so on.Aiming at these defects,this paper designs a distributed backup data dedeplication system based on data routing.It uses data chunk as deduplication granularity,whose functions involve data routing and data prefetching.Data routing uses the Bloom filter to query data chunks to be processed,and applies average sampling and neighbor sampling based on Jaccard distance to prefetch data chunks.This system uses data routing to assign data chunks to the corresponding processing nodes to deal with.Data chunks' hash code obtained through average sampling provides routing information for data routing.And data chunks' hash code obtained through neighbor sampling is used for the first data deduplication of the system.Experimental results show that the data throughput of this system increases significantly compared with all processing node query and fixed data routing,while maintaining the deduplication ratio.关键词
数据去重/数据路由/数据预取/布隆过滤器/Jaccard距离Key words
data deduplication/data routing/data prefetching/Bloom filter/Jaccard distance分类
信息技术与安全科学引用本文复制引用
姚敏,尹建伟,唐彦,罗智凌..基于数据路由的分布式备份数据去重系统[J].计算机工程,2017,43(2):85-91,7.基金项目
国家科技支撑计划项目“现代服务业跨界服务共性技术体系研发与示范应用”(2013AA01A213). (2013AA01A213)