计算机工程与应用2017,Vol.53Issue(12):76-84,9.DOI:10.3778/j.issn.1002-8331.1605-0320
基于子模式的关系数据到图数据ETL方法研究
Research on ETL method of transforming relational data to graph data based on sub-schema
摘要
Abstract
For addressing problems such as multi-layer relational query and community detection, graph database outper-forms relational database. However, most data of existing applications have stored in the form of relationship. Therefore, how to extract-transform-load(ETL)relational data to graph data efficiently and absolutely is still an important problem of deploying graph database applications. Existing researches suffer from three major limitations:(1)The quality of con-verted graph data are poor;(2)the efficiency of transforming is low;(3)the transformed results are not suitable for dis-tributed storage. To overcome these limitations, a sub-schema-based ETL method for transforming relational data to graph data is proposed in this paper. By splitting schema of relational database to several sub-schemas, this method improves the algorithm and procedure of previous ETLs and provides an efficient way for parallel ETL. The transformed results can sat-isfy the requirements of distributed storage, and conduct to be the basis data for Spark GraphX computing framework. Fi-nally, Java EE and Neo4j are applied to implement the prototype system for experimental verification. The comparative re-sults show that the improved ETL method yields better performance than previous methods.关键词
图数据库/分布式存储/ETL(数据提取、转换和加载)/子模式Key words
graph database/distributed storage/extract-transform-load(ETL)/sub-schema分类
信息技术与安全科学引用本文复制引用
丁强龙,王津,张学杰..基于子模式的关系数据到图数据ETL方法研究[J].计算机工程与应用,2017,53(12):76-84,9.基金项目
国家自然科学基金(No.61170222). (No.61170222)