| 注册
首页|期刊导航|计算机技术与发展|基于MR的高可靠分布式数据流统计模型

基于MR的高可靠分布式数据流统计模型

朱蔚林 木伟民 金宗泽 王伟平

计算机技术与发展2018,Vol.28Issue(1):6-10,16,6.
计算机技术与发展2018,Vol.28Issue(1):6-10,16,6.DOI:10.3969/j.issn.1673-629X.2018.01.002

基于MR的高可靠分布式数据流统计模型

Statistical Model of Distrubuted Data Strem with High Reliability Based on MR

朱蔚林 1木伟民 2金宗泽 1王伟平1

作者信息

  • 1. 中国科学院 信息工程研究所,北京 100093
  • 2. 中国科学院大学,北京 100049
  • 折叠

摘要

Abstract

According to the unique characteristics of the data stream,with consecutive grouping statistics based on window model in the data flow as application scenarios,combined with the advantages of mainstream stream data processing platform like Storm and Spark Streaming, we propose a distributed statistical model of data stream with high throughput and scalability as well as low latency,namely Mars. It solves the problems of strong throughput and low latency due to losing data easily and strong timelessness. On the fault-tolerant,Mars provides at-least-once semantic support against major errors. It is tested in real experiment environment and made a comparison with the currently pop-ular distributed flow processing platform Spark Streaming and Storm,which show that it is between them in real-time operation delay for da-ta. However,in terms of the scale of the cluster,Mars' throughput rate is significantly better than that of the two,and in terms of semantic accuracy,it achieves the semantic limits of the same level as Storm.

关键词

数据流/分组统计/连续查询/分布式系统/实时处理

Key words

data stream/grouping statistic/continuous query/distributed system/real-time processing

分类

信息技术与安全科学

引用本文复制引用

朱蔚林,木伟民,金宗泽,王伟平..基于MR的高可靠分布式数据流统计模型[J].计算机技术与发展,2018,28(1):6-10,16,6.

基金项目

国家自然科学基金(61402473) (61402473)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文