| 注册
首页|期刊导航|计算机技术与发展|上下文语义嵌入的变粒度云存储相似数据去重技术

上下文语义嵌入的变粒度云存储相似数据去重技术

阳智欢 田纹龙 何婷婷 叶旭明 唐佳

计算机技术与发展2024,Vol.34Issue(4):16-23,8.
计算机技术与发展2024,Vol.34Issue(4):16-23,8.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0003

上下文语义嵌入的变粒度云存储相似数据去重技术

Variable Granularity-based Chunk-context Aware Similar Data Deduplication Technique for Cloud Storage

阳智欢 1田纹龙 2何婷婷 3叶旭明 1唐佳1

作者信息

  • 1. 南华大学计算机学院,湖南衡阳 421001
  • 2. 南华大学计算机学院,湖南衡阳 421001||新加坡南洋理工大学 数理科学学院,新加坡 637371
  • 3. 衡阳师范学院教育科学学院,湖南衡阳 421010
  • 折叠

摘要

Abstract

Aiming at the problems of poor effect of existing similar data deduplication techniques and high metadata overhead in cloud storage environment,variable granularity-based chunk-context aware similar data deduplication technique for cloud storage is proposed.The technique adopts a feature extraction algorithm based on sub-block reorganization to perform initial feature extraction of the internal structure of the data block content,and utilizes a BP(Back Propagation)neural network context-aware model to embed the data block contextual feature information into the initial features,realizing a variable granularity data block with contextual semantic embedding.A better representation of similar data blocks is obtained by controlling the data block size,dynamically merging neighboring similar data blocks or non-redundant data blocks to reduce metadata overhead,and segmenting the transition region located between similar and non-redundant data blocks.Finally,to evaluate its performance,a prototype variable granularity similar data detection algorithm,rCARD,is implemented and extensively experimented on real world datasets.The experimental results show that compared to the latest similarity de-tection deduplication technique Finesse,rCARD achieves a higher deduplication rate while significantly reducing the metadata size and ac-celerates the similarity detection speedup by up to 11.07 times.

关键词

相似数据去重/数据块语义/变粒度/云存储/元数据

Key words

similar data deduplication/data block semantics/variable granularity/cloud storage/metadata

分类

信息技术与安全科学

引用本文复制引用

阳智欢,田纹龙,何婷婷,叶旭明,唐佳..上下文语义嵌入的变粒度云存储相似数据去重技术[J].计算机技术与发展,2024,34(4):16-23,8.

基金项目

湖南省自然科学基金项目(2021JJ40468) (2021JJ40468)

湖南省教育厅优青项目(22B0437) (22B0437)

湖南省教师教育研究基地(XJK23AJD014) (XJK23AJD014)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文