高技术通讯2016,Vol.26Issue(6):534-541,8.DOI:10.3772/j.issn.1002-0470.2016.06.003
面向海量NetFlow数据的存储和查询处理方法研究
Research on storage and query processing for massive NetFlow data
摘要
Abstract
Considering that China backbone network' s NetFlow data has the features of high arrival rate, large amount and need of frequent multidimensional query operation, the study proposed a multidimensional attributes clustering storage ( MACS) model.According to the properties of real applicable queries, the proposed MACS model conducts space partition on NetFlow data, and stores the data in the way of parallel pipelining.Moreover, a hyper-polyhed-ron query mode for NetFlow data was presented.The experiments performed in real application environments show that the real time data storing rate of a single system realized with the model can achieve the storing rate up to 2.7 million records per second, which is more faster than all the other systems.Especially, the speed of the proposed multidimensional query is faster than Hive and Impala.关键词
NetFlow,多维属性聚簇存储(/MACS)模型,实时数据存储,超多面体Key words
NetFlow/multidimensional attributes clustering storage ( MACS) model/real time data storage/super polyhedron引用本文复制引用
陈重韬,王伟平,孟丹,崔甲,胡斌..面向海量NetFlow数据的存储和查询处理方法研究[J].高技术通讯,2016,26(6):534-541,8.基金项目
国家科技支撑计划(2012BAH46B03),国家自然科学基金(61402473),核高基(2013ZX01039-002-001-001)和中国科学院先导专项(XDA06030200)资助项目. (2012BAH46B03)