基于统计推理的不一致数据清洗方法OA北大核心CSTPCD
Cleaning inconsistent data based on statistical inference
不一致数据修复是数据清洗领域的一个重要研究方向,现有方法大多是基于完整性约束规则的,采用最小代价原则进行修复,然而,代价最小的修复方案通常是不正确的,导致现有修复方法的准确率较低.针对现有方法准确率较低的问题,提出了 一种基于统计推理的不一致数据清洗方法BayesOUR,兼顾修复的代价与质量,提高修复准确性.BayesOUR主要分为三个阶段:首先根据完整性约束规则进行错误检测;然后利用贝叶斯网络推理所有可能的一致性修复方案概率;最后选择概率最大的修复方案进行数据清洗.真实数据上的实验结果表明,该方法与目前领先的方法相比,能够显著提高不一致数据修复的准确性.
Inconsistent data repair is an important research direction in the field of data repair.Most of the existing methods are based on integrity constraint rules and use the principle of minimum cost for repair.However,the repair scheme with the minimum cost is usually incorrect,which leads to the low accuracy rate of the existing repair methods.To address the problem of low accuracy of existing methods,this paper proposed an inconsistent data repair method based on statistical inference BayesOUR,to balance the cost and quality of repair and improve the repair accuracy.It mainly divided BayesOUR into three phases.Firstly,it performed error detection based on the integrity constraint rule,and then utilized Bayesian network to reason about the probability of all the possible consistent repair schemes.Finally,it selected the repair scheme with the largest proba-bility for data repair.Experimental results on real data show that the method in this paper can significantly improve the accura-cy of inconsistent data repair compared with the current leading methods.
张安珍;胡生吉;夏秀峰
中国科学院沈阳计算技术研究所,沈阳 110168||沈阳航空航天大学计算机学院,沈阳 110136沈阳航空航天大学计算机学院,沈阳 110136
计算机与自动化
不一致数据贝叶斯网络统计推理
inconsistent dataBayesian networkprobabilistic inference
《计算机应用研究》 2024 (010)
2987-2992 / 6
国家自然科学基金青年基金资助项目(6210071734)
评论