计算机技术与发展2018,Vol.28Issue(5):1-4,4.DOI:10.3969/j.issn.1673-629X.2018.05.001
大数据下数据预处理方法研究
Research on Data Preprocessing Methods for Big Data
摘要
Abstract
In the era of big data,it is an enormous challenge about data perception,expression,understanding and computing due to the in-herent complexity of data type,organization pattern,different relations and data quality.Data preprocessing is a very important preparation before data analysis and mining.On the one hand,it ensures the correctness and effectiveness of data mining.On the other hand,the ad-justment of the data format and content makes data meet the demand of mining.We analyze the main tasks of data preprocessing and sum-marize several popular processing methods for handling various kinds of"dirty data".The algorithms of data cleaning,integration,trans-formation and reduction are discussed in detail.Using such kinds of preprocessing methods,we can remove redundant and error data,im-prove the incomplete data,promote the required data integration,help data refinement and data consistency of centralized storage.We also can get the minimum and the most reliable data set necessary for the mining system.It also reduces the cost of data mining and improves the accuracy,validity and practicability of knowledge discovery.关键词
大数据/预处理/脏数据/研究Key words
big data/preprocessing/dirty data/research分类
信息技术与安全科学引用本文复制引用
孔钦,叶长青,孙赟..大数据下数据预处理方法研究[J].计算机技术与发展,2018,28(5):1-4,4.基金项目
国家自然科学基金(90412014) (90412014)
全国高等院校计算机基础教育研究会计算机基础教学研究与改革课题(AFCEC-2016-18) (AFCEC-2016-18)