基于有状态实时流的流批一体数据处理平台的设计与研究OA
Design and Research of a Flow Batch Integration Data Processing Platform Based on Stateful Real-time Flow
当今,数据的规模和复杂性不断增加,对数据处理平台的要求也越来越高.传统的批处理和实时处理技术各有优缺点,很难满足大规模数据处理的需求.因此,流批一体化的数据处理平台应运而生.文章在讨论流批一体核心架构设计的基础上,提出一种基于有状态实时流的流批一体数据处理方法,并通过平台化的方式实现流批一体数据的处理与计算.该平台先后在四川高速集团、贵阳政府单位示范应用,应用结果表明平台不仅统一了批处理和流处理框架,而且具有高效、可靠、可扩展等优点,同时能够满足大规模数据处理的需求.该平台的实现对于提高数据处理效率和准确性具有重要意义.
Today,the scale and complexity of data are constantly increasing,and the requirements for data processing platforms are also increasing.Traditional batch processing and real-time processing technologies have their own advantages and disadvantages,making it difficult to meet the needs of large-scale data processing.Therefore,a data processing platform that integrates flow processing and batch processing has emerged.On the basis of discussing the core architecture design of flow batch integration,this paper proposes a data processing method for flow batch integration based on stateful real-time flow,and implements the processing and calculation of flow batch integration data through a platform based approach.This platform has been demonstrated application in Sichuan Expressway Group and Guiyang government units.The application results show that the platform not only unifies batch processing and flow processing frameworks,but also has the advantages of efficiency,reliability,scalability,and can meet the needs of large-scale data processing.The implementation of this platform is of great significance for improving data processing efficiency and accuracy.
周维;曹扬;谢红韬;胡建
中电科大数据研究院有限公司,贵州 贵阳 550081||提升政府治理能力大数据应用技术国家工程研究中心,贵州 贵阳 550081
计算机与自动化
批处理有状态实时流平台化流批一体
batch processingstateful real-time flowplatformizationflow batch integration
《现代信息科技》 2024 (006)
29-34 / 6
国家自然科学基金(U19B2027)
评论