面向复杂数据审计需求的数据血缘构建方法OACSTPCD
Data lineage construction method for complex data audit requirements
针对复杂数据审计需求,现有方法是通过查询分析数据库中每条执行语句信息,数据审计效率低下;目前也有一些手段是使用数据血缘工具进行快速查找,但是这种方式需要侵入系统获取源码,容易造成数据泄露或者被恶意窜改.针对这些问题,提出一种面向复杂数据审计需求的数据血缘构建方法,融合日志预处理、数据关系解析、数据对齐等关键技术,通过解析系统运行日志信息以非侵入的方式实现数据血缘图谱的构建,并面向烟草物流出入库环节形成数据审计工具.以烟草物流中13 796个批次货物在流转过程中所对应的155 728条事务日志为测试数据集,从完整性、构建成本、数据审计效率三个方面进行对比实验.结果表明,提出的方法能够在10s内完成查询任务,占用内存为1.23 MB/百条,明显少于现有方法.相比现有方法,提出的方法可在数据级粒度上进行完整准确的数据血缘构建,且使用基于该方法所构建的数据血缘进行数据审计能够大幅度提升卷烟物流过程中的数据审计效率.
For complex data audit requirements,existing methods rely on querying and analyzing the information of each exe-cution statement in the database,resulting in low efficiency of data audit.At present,there are also some methods that use da-ta lineage tools for quick search,but these methods require intrusion into the system to obtain source code,which can easily cause data leakage or malicious tampering.In response to these issues,this paper proposed a data lineage construction method for complex data audit requirements,integrating key technologies such as log preprocessing,data relationship analysis,and data alignment.By analyzing the system's running log information,it constructed the data lineage graph in a non-invasive manner,and formed a data audit tool for the tobacco logistics inbound and outbound.This paper took 155 728 transaction logs corresponding to 13 796 batches of goods in the tobacco logistics as the test dataset and conducted comparative experiments from three aspects,such as completeness,construction cost,and data audit efficiency.The experimental results show that the proposed method can complete the query task within 10 s,occupying a memory of 1.23 MB/hundred items,which is obviously less than the existing methods.Compared with the existing methods,the proposed method can construct a complete and accu-rate data lineage at the data level granularity,and using the data lineage constructed by this proposed method can greatly im-prove the efficiency of data auditing in the cigarette logistics.
潘晓华;金泳;高扬华;朱心洲;沈诗婧
浙江大学软件学院,杭州 310058浙江中烟工业有限责任公司信息中心,杭州 310007浙江大学滨江研究院国产信创技术研究中心,杭州 310053
计算机与自动化
数据血缘非侵入式数据审计卷烟物流自动化作业
data lineagenon-invasivedata auditcigarette logisticsautomated job
《计算机应用研究》 2024 (001)
76-82 / 7
浙江省科技计划资助项目(2023C01213);"尖兵""领雁"研发攻关计划资助项目
评论