| 注册
首页|期刊导航|计算机科学与探索|基于Spark的序列数据质量评价

基于Spark的序列数据质量评价

韩超 段磊 邓松 王慧锋 唐常杰

计算机科学与探索2017,Vol.11Issue(6):897-907,11.
计算机科学与探索2017,Vol.11Issue(6):897-907,11.DOI:10.3778/j.issn.1673-9418.1609008

基于Spark的序列数据质量评价

Evaluation of Sequential Data Quality Using Spark

韩超 1段磊 1邓松 2王慧锋 3唐常杰1

作者信息

  • 1. 四川大学 计算机学院,成都 610065
  • 2. 四川大学 华西公共卫生学院,成都 610041
  • 3. 南京邮电大学 先进技术研究院,南京 210003
  • 折叠

摘要

Abstract

Sequential data are prevalent in many real world applications. The quality evaluation on sequential data, which attracts the attentions from both academic research and industry fields, is important and prerequisite for extracting knowledge from the sequential data. Recently, a method using the probabilistic suffix tree has been proposed for evaluating the sequential data quality. However, this method cannot deal with the large-scale data set. To break this limitation, this paper proposes a Spark-based algorithm, called STALK (sequential data quality evaluation with Spark), for evaluating the quality of large-scale sequential data. Moreover, this paper uses the novel pruning strategies to improve the efficiency of STALK. Specifically, on the Spark platform, the large-scale sequential data are efficiently used to generate model, and the data quality of query sequence can be evaluated according to the generated model rapidly. Experiments on real-world sequential data sets demonstrate that STALK is effective, efficient and scalable.

关键词

数据质量/概率后缀树/Spark/并行计算

Key words

data quality/probabilistic suffix tree/Spark/parallel computing

分类

信息技术与安全科学

引用本文复制引用

韩超,段磊,邓松,王慧锋,唐常杰..基于Spark的序列数据质量评价[J].计算机科学与探索,2017,11(6):897-907,11.

基金项目

The National Natural Science Foundation of China under Grant Nos. 61572332, 51507084 (国家自然科学基金) (国家自然科学基金)

the Postdoctoral Science Foundation of China under Grant Nos. 2016T90850, 2016M591890 (中国博士后科学基金) (中国博士后科学基金)

the Fundamental Research Funds for the Central Universities of China under Grant No. 2016SCU04A22 (中央高校基本科研业务费专项资金). (中央高校基本科研业务费专项资金)

计算机科学与探索

OA北大核心CSCDCSTPCD

1673-9418

访问量0
|
下载量0
段落导航相关论文