计算机科学与探索Issue(10):1180-1194,15.DOI:10.3778/j.issn.1673-9418.1505080
带间隔约束的序列数据质量评价算法设计
Design of evaluating sequential data quality with gap constraint
摘要
Abstract
Sequential data, which widely exists in real world applications, is an important research topic in data mining. The reliability of the mining results depends on the quality of sequences. Traditional data quality evaluation methods analyze the data quality problem by statistical indicator, but the statistical indicator can̓t evaluate the relationship of each element in the unstructured sequence. To detect the quality of a sequence, this paper proposes a quality evalua-tion algorithm for sequential data using the probability suffix tree. Specifically, under the specified gap constraint, a probability suffix tree is built based on the sequences with reliable quality. Then, the tree is used for evaluating the quality of a query sequence. Finally, experiments on real-world sequence sets confirm the effectiveness, efficiency and scalability of the proposed algorithm.关键词
数据质量/概率后缀树/间隔约束Key words
data quality/probabilistic suffix tree/gap constraint分类
信息技术与安全科学引用本文复制引用
王慧锋,段磊,胡斌,邓松,王文韬,秦攀..带间隔约束的序列数据质量评价算法设计[J].计算机科学与探索,2015,(10):1180-1194,15.基金项目
The National Natural Science Foundation of China under Grant No.61103042(国家自然科学基金) (国家自然科学基金)
the Postdoctoral Science Foun-dation of China under Grant No.2014M552371(中国博士后科学基金) (中国博士后科学基金)
the Program of State Key Laboratory of Software Engineering of Wuhan University under Grant No. SKLSE2012-09-32(武汉大学软件工程国家重点实验室开放研究基金) (武汉大学软件工程国家重点实验室开放研究基金)