计算机工程2025,Vol.51Issue(12):43-55,13.DOI:10.19678/j.issn.1000-3428.0252059
一种面向大规模知识图谱的混合存储方案
A Hybrid Storage Scheme for Large-scale Knowledge Graphs
摘要
Abstract
Knowledge graphs,a crucial form of data organization in the field of artificial intelligence,are widely applied across numerous domains with the increased development of big data and large-scale models.As the scale of knowledge graphs continues to expand,existing storage structures have encountered challenges such as slow data ingestion and excessive storage space occupation.To address these issues,this paper proposes a hybrid storage scheme based on relational+key-value and designs an entity clustering algorithm based on attribute frequency.This scheme utilizes an attribute-frequency-based clustering algorithm to classify entity clusters.By combining the proposed scheme and algorithm,high-frequency attributes are stored in a relational manner and rare attributes are stored in a key-value pair manner.This design effectively mitigates the drawbacks of relational storage(such as generating excessive NULL values when handling sparse data)while reducing key duplication issues inherent in key-value storage and significantly improves storage efficiency without compromising data flexibility.Experiments on synthetic and real-world datasets show that compared to existing schemes,the proposed scheme can save over 50%of storage space on real-world datasets,increases the data ingestion speed by an order of magnitude,and this scheme has no significant impact on query performance,thus effectively solving the storage challenges of large-scale knowledge graphs,providing strong storage support for the wide application of knowledge graphs across various fields,and having important theoretical significance and practical value.关键词
知识图谱/资源描述框架图/属性图/关系型数据库/数据存储Key words
knowledge graph/Resource Description Framework(RDF)graph/property graph/relational database/data storage分类
信息技术与安全科学引用本文复制引用
YOU Yiheng,WANG Xin,MA Menglu,WANG Hui..一种面向大规模知识图谱的混合存储方案[J].计算机工程,2025,51(12):43-55,13.基金项目
国家自然科学基金面上项目(62472311). (62472311)