| 注册
首页|期刊导航|计算机技术与发展|大数据下数据预处理方法研究

大数据下数据预处理方法研究

孔钦 叶长青 孙赟

计算机技术与发展2018,Vol.28Issue(5):1-4,4.
计算机技术与发展2018,Vol.28Issue(5):1-4,4.DOI:10.3969/j.issn.1673-629X.2018.05.001

大数据下数据预处理方法研究

Research on Data Preprocessing Methods for Big Data

孔钦 1叶长青 1孙赟1

作者信息

  • 1. 南京大学,江苏 南京210089
  • 折叠

摘要

Abstract

In the era of big data,it is an enormous challenge about data perception,expression,understanding and computing due to the in-herent complexity of data type,organization pattern,different relations and data quality.Data preprocessing is a very important preparation before data analysis and mining.On the one hand,it ensures the correctness and effectiveness of data mining.On the other hand,the ad-justment of the data format and content makes data meet the demand of mining.We analyze the main tasks of data preprocessing and sum-marize several popular processing methods for handling various kinds of"dirty data".The algorithms of data cleaning,integration,trans-formation and reduction are discussed in detail.Using such kinds of preprocessing methods,we can remove redundant and error data,im-prove the incomplete data,promote the required data integration,help data refinement and data consistency of centralized storage.We also can get the minimum and the most reliable data set necessary for the mining system.It also reduces the cost of data mining and improves the accuracy,validity and practicability of knowledge discovery.

关键词

大数据/预处理/脏数据/研究

Key words

big data/preprocessing/dirty data/research

分类

信息技术与安全科学

引用本文复制引用

孔钦,叶长青,孙赟..大数据下数据预处理方法研究[J].计算机技术与发展,2018,28(5):1-4,4.

基金项目

国家自然科学基金(90412014) (90412014)

全国高等院校计算机基础教育研究会计算机基础教学研究与改革课题(AFCEC-2016-18) (AFCEC-2016-18)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文