东南大学学报(英文版)2008,Vol.24Issue(3):264-266,3.
一种基于谱分量相似度的多数据流聚类算法
Clustering algorithm for multiple data streams based on spectral component similarity
摘要
Abstract
A new algorithm for clustering multiple data streams is proposed. The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams. It exploits estimated frequencies spectra to extract the essential features of streams. Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely, amplitude, phase, damping rate and frequency. The ε-lag-correlation between two spectral components is calculated. The algorithm uses such information as similarity measures in clustering data streams. Based on a sliding window model, the algorithm can continuously report the most recent clustering results and adjust the number of clusters. Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.关键词
数据流/聚类/AR模型/谱分量Key words
data streams/clustering/AR model/spectral compo-nent分类
信息技术与安全科学引用本文复制引用
邹凌君,陈峻,屠莉..一种基于谱分量相似度的多数据流聚类算法[J].东南大学学报(英文版),2008,24(3):264-266,3.基金项目
The National Natural Science Foundation of China (No.60673060), the Natural Science Foundation of Jiangsu Province (No.BK2005047). (No.60673060)