计算机应用研究2018,Vol.35Issue(2):375-380,6.DOI:10.3969/j.issn.1001-3695.2018.02.013
MSOLA:基于多维分层采样的大数据在线聚集技术
MSOLA:big data online aggregation based on multi-dimension stratified sampling
摘要
Abstract
Online aggregation estimates the query results through statistical computing,and it can provides feedback to users before the query finishes,which is of paramount importance in the field of big data analysis.The existing studies always adopt uniform sampling,which results in estimate inaccuracy and slow convergence.This paper proposed the multi-dimension stratified sampling technique based on workload characteristics and data distribution,and designed the result estimate and confidence interval compute algorithms based on Storm.The experiments demonstrate that the proposed online aggregation technique improves the accuracy of estimated results in online aggregation with efficient scalability.关键词
在线聚集/大数据/多维分层采样/负载分析Key words
online aggregation(OLA)/big data/multi-dimension stratified sampling/workload analysis分类
信息技术与安全科学引用本文复制引用
史英杰,杜方,尤亚东..MSOLA:基于多维分层采样的大数据在线聚集技术[J].计算机应用研究,2018,35(2):375-380,6.基金项目
国家自然科学基金资助项目(61502279,61363018) (61502279,61363018)
山东省自然科学基金资助项目(ZR2015FM013) (ZR2015FM013)
北京市教委科技计划项目(KM201710012008) (KM201710012008)