计算机技术与发展2024,Vol.34Issue(4):16-23,8.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0003
上下文语义嵌入的变粒度云存储相似数据去重技术
Variable Granularity-based Chunk-context Aware Similar Data Deduplication Technique for Cloud Storage
摘要
Abstract
Aiming at the problems of poor effect of existing similar data deduplication techniques and high metadata overhead in cloud storage environment,variable granularity-based chunk-context aware similar data deduplication technique for cloud storage is proposed.The technique adopts a feature extraction algorithm based on sub-block reorganization to perform initial feature extraction of the internal structure of the data block content,and utilizes a BP(Back Propagation)neural network context-aware model to embed the data block contextual feature information into the initial features,realizing a variable granularity data block with contextual semantic embedding.A better representation of similar data blocks is obtained by controlling the data block size,dynamically merging neighboring similar data blocks or non-redundant data blocks to reduce metadata overhead,and segmenting the transition region located between similar and non-redundant data blocks.Finally,to evaluate its performance,a prototype variable granularity similar data detection algorithm,rCARD,is implemented and extensively experimented on real world datasets.The experimental results show that compared to the latest similarity de-tection deduplication technique Finesse,rCARD achieves a higher deduplication rate while significantly reducing the metadata size and ac-celerates the similarity detection speedup by up to 11.07 times.关键词
相似数据去重/数据块语义/变粒度/云存储/元数据Key words
similar data deduplication/data block semantics/variable granularity/cloud storage/metadata分类
信息技术与安全科学引用本文复制引用
阳智欢,田纹龙,何婷婷,叶旭明,唐佳..上下文语义嵌入的变粒度云存储相似数据去重技术[J].计算机技术与发展,2024,34(4):16-23,8.基金项目
湖南省自然科学基金项目(2021JJ40468) (2021JJ40468)
湖南省教育厅优青项目(22B0437) (22B0437)
湖南省教师教育研究基地(XJK23AJD014) (XJK23AJD014)