首页|期刊导航|微型电脑应用|基于多条件时间序列的海量并行数据清洗算法

基于多条件时间序列的海量并行数据清洗算法

高祖彦段昌盛

微型电脑应用2025，Vol.41Issue(4)：21-24,4.

基于多条件时间序列的海量并行数据清洗算法

Massive Parallel Data Cleaning Algorithm Based on Multi-conditional Time Series

高祖彦 ¹段昌盛²

作者信息

1. 恩施职业技术学院,教务处,湖北,恩施 445000
2. 恩施职业技术学院,信息工程学院,湖北,恩施 445000
折叠

摘要

Abstract

Aimed at problems of massive data in various fields and existing duplicate,missing and invalid data,a massive paral-lel data cleaning algorithm based on multi-conditional time series is studied.The approximate symbol aggregation algorithm is used to discretize and symbolize the multi-conditional time series,and the similarity measurement method is used to solve the similarity of the multi-conditional time series after processing.Combined with MapReduce parallel computing platform,a mas-sive data cleaning algorithm based on sequential similarity measurement is written on this platform to realize the parallel pro-cessing of massive data cleaning.The experimental results show that the distance between the time series of the data after cleaning is more consistent with the real value,and high-quality data can be obtained through cleaning.At the same time,the introduction of parallel processing greatly reduces the time of data cleaning.

关键词

多条件时间序列/海量并行数据/数据清洗/MapReduce

Key words

multi-conditional time serie/massive parallel data/data cleaning/MapReduce

分类

信息技术与安全科学

引用本文复制引用

高祖彦,段昌盛..基于多条件时间序列的海量并行数据清洗算法[J].微型电脑应用,2025,41(4):21-24,4.

基金项目

教育部科技发展中心高校产学研创新基金(2018A03016) （2018A03016）

微型电脑应用

ISSN：1007-757X

访问量0

下载量0

段落导航