计算机与数字工程Issue(10):1717-1722,1728,7.DOI:10.3969/j.issn.1672-9722.2015.10.001
大数据环境下一种基于可变网格的高维数据索引
A High-dimension Index Based on Variable Grid in Big Data Environment
摘要
Abstract
With the rapid development of the Internet and cloud computing techniques ,the amount of data in the whole sectors of national economy increases sharply ,especially the high‐dimensional big data ,such as the network transactions da‐ta ,the user reviews data and the multimedia data .A proper index structure to support high‐dimension big data can improve the performance of similarity query on high‐dimensional big data .Therefore ,a distributed two‐level index structure is pro‐posed firstly ,in which global index maintains all the information of subspace in the whole data space ,and in which local in‐dex builds M‐tree on each subspace to organize local high‐dimension data .Secondly ,a similarity search algorithm is proposed based on our two‐level index ,including point query and range query .When processing queries from users ,global index can quickly locate and judge which subspaces are relevant to the query and send the query to relevant subspaces .Queries will be processed on local nodes concurrently .This approach can avoid lots of unnecessary retrieves on query‐irrelevant subspaces . Lastly ,massive of experiments also show that the proposed index is much better than existing high‐dimension index struc‐ture ,and has good query performance and scalability .关键词
高维数据/大数据/可变网格/M 树Key words
high-dimensional data/big data/variable grid/M-tree分类
信息技术与安全科学引用本文复制引用
宋宝燕,刘宇,丁琳琳..大数据环境下一种基于可变网格的高维数据索引[J].计算机与数字工程,2015,(10):1717-1722,1728,7.基金项目
国家自然科学基金(编号61472169,61472069,61502215);辽宁省教育厅优秀人才项目(编号LR201017);辽宁省教育厅科学研究一般项目(编号L2015193);辽宁省科学技术计划项目(编号2012216007);辽宁大学青年科研基金(编号LDQN201438)资助。 ()