|国家科技期刊平台
首页|期刊导航|计算机工程|健壮且自适应的学习型近似查询处理方法研究

健壮且自适应的学习型近似查询处理方法研究OACSTPCD

Research on Robust and Adaptive Learned Approximate Query-Processing Method

中文摘要英文摘要

由于在大规模数据集上执行精确查询耗时较长,因此近似查询处理(AQP)技术常被用于在线分析处理,目的是以较短的交互延迟返回查询结果,并尽可能地降低查询误差.现有的学习型AQP方法与底层数据解耦,将I/O密集型计算转化为CPU密集型计算,但是由于计算资源的限制,该类方法通常基于随机的数据样本进行模型训练,此类训练数据会引起稀有群组缺失问题,导致模型预测准确性不高.针对上述问题,提出一种基于分层样本学习的混合型和积网络模型,并基于该模型设计一种AQP框架.分层样本能够有效避免稀有群组缺失现象,基于该样本训练的模型预测准确性大幅提升.此外,针对数据动态更新的情况,提出一种模型自适应更新策略,使得模型能够及时检测数据偏移现象并自适应地执行更新.实验结果表明,与基于抽样和基于机器学习的AQP方法相比,该模型在真实数据集和合成数据集上的平均相对误差分别约降低18.3%和2.2%,在数据动态更新的场景下,其准确性和查询时延均呈现出良好的稳定性.

Owing to the significant latency of exact queries on large-scale datasets,Approximate Query-Processing(AQP)techniques are typically applied to online analytical processing to return query results within interactive timescales with minimal error.The existing learning-based AQP methods decouple the underlying data and convert I/O-intensive calculations into CPU-intensive calculations.However,because of the limitations of computing resources,model training is typically performed based on random data samples.Such training data eliminate rare populations,thus resulting in unsatisfactory prediction accuracy by the model.Hence,this paper proposes a Stratified Sampling-based Sum-Product Network(SSSPN)model and designs an AQP framework based on the abovementioned model.Stratified samples can effectively avoid the elimination of rare populations and significantly improves the model accuracy.Additionally,in terms of dynamic data updates,this paper proposes an adaptive model-update strategy that allows the model to detect data shifts timely and automatically perform updates adaptively.Experimental results show that compared with the performance of AQP methods based on sampling and machine learning,the average relative errors of this model on real and synthetic datasets are approximately 18.3%and 2.2%lower,respectively;in scenarios where data are dynamically updated,both the accuracy and query latency of the model are favorable.

乔艺萌;荆一楠;张寒冰

复旦大学软件学院,上海 200441复旦大学计算机科学技术学院,上海 200433

计算机与自动化

近似查询处理和积网络分层抽样数据偏移自适应更新

Approximate Query-Processing(AQP)Sum-Product Networks(SPN)stratified samplingdata shiftadaptive update

《计算机工程》 2024 (001)

时空数据复杂查询的近似处理方法研究

30-38 / 9

国家自然科学基金(62072113).

10.19678/j.issn.1000-3428.0066743

评论