南京大学学报(自然科学版)2017,Vol.53Issue(3):483-496,14.DOI:10.13232/j.cnki.jnju.2017.03.013
基于近期最远遍历的支撑点选择
Pivot selection on recent farthest traversal
摘要
Abstract
Metric-space data management and analysis method abstracts data to the points of metric space has a high generality,and is one of effective approaches for the variety challenge of big data.Since coordinates do not exist in metric space,many mathematical tools can't be used directly and it is common to consider the distances from a data object to pre-selected reference points(or called pivots) as its coordinates.The quality of pivots is critical to the performance of data management and analysis in metric space.Farthest First Traversal(FFT) algorithm is able to select corners of the data set with linear time and space complexity,and is one of the most widely used pivot selection methods.However,experiments show that the best pivots are usually not the most corners,so it is very hard to select best pivots by FFT.This work proposes the Recent Farthest Traversal(RFT) algorithm.It selects next pivot only based on several recent pivots and is able to select better pivots faster.At the same time,experiments also show that FFT can sample points evenly in the data set.This work also proposes the Pivot Set Selection(PSS) algorithm,which is able to get all pivots just with one selection.With exploiting RFT to select a candidate set and exploiting FFT to sample an evaluation set,PSS can select a better set of pivots to create a similarity index.As a result,the construction cost of similarity index is greatly reduced and the index performance has some improvements.Experiments show that RFT selects good pivots far more rapidly than FFT and the accuracy of RFT is higher than FFT,while FFT generates a good sample of data.关键词
度量空间/多样性/支撑点选择/大数据Key words
metric space/variety/pivot selection/big data分类
信息技术与安全科学引用本文复制引用
李兴亮,毛睿..基于近期最远遍历的支撑点选择[J].南京大学学报(自然科学版),2017,53(3):483-496,14.基金项目
国家863项目(2015AA015305),国家自然科学基金(U1301252,61471243),广东省重点实验室项目(2012A061400024),深圳市基础研究项目(JCYJ20140418095735561,JCYJ20150731160834611,JCYJ20150625101524056),深港创新圈项目(SGLH20131010163759789),广东省教育厅项目(2015KQNCX143) (2015AA015305)