基于改进的K-means聚类分区均匀化空间学习索引OA北大核心CSTPCD
Develop spatial learning indexing using improved K-means clustering partition
传统空间索引的体量随数据量的增加而膨胀,查询效率较低.学习索引的体量不随数据量的增加而膨胀,同时避免了层级比较查询,性能优异.将学习索引应用于空间索引存在2个难点:一是选取合适的降维方法实现空间数据的排序;二是对降维后数据序列进行有效的简化分布计算,使其易于拟合.基于此,提出了一种网格混合聚类分区学习索引(grid-ml),用z曲线进行降维,用双层网格结构优化查询策略,用改进的K-means聚类算法进行数据分区,实现数据分布均匀化.对比实验发现,grid-ml构建速度快、存储空间小、查询效率高,较传统空间索引优势显著.
With the rapid increase of data size,the defects of traditional spatial indexing become more and more apparent.In comparison,learning indexing is based on data distribution.Its volume will not expand with the increase of the amount of data,and can achieve better performance without performing hierarchical comparison.Nevertheless,there are still two difficulties in applying the idea of learning indexing to spatial data:(1)How to choose appropriate dimension reduction method to sort the spatial data.(2)How to simplify data distribution of the dimension reduced data and make it easy to fit.This paper proposes a new type of grid mixed cluster partition learning indexing(grid-ml)based on the idea of learning indexing.In view of the above two difficulties,grid-ml uses z curve to reduce the dimension,and deals with the jumping problem with double-layer grid structure.Then,the improved K-means clustering method is used to simplify data distribution.The results show that grid-ml builds fast with small spatial storage volume,and can query fast as well,demonstrating significant advantages over the traditional spatial indexing approach.
傅晨华;张丰;胡林舒;王立君
浙江大学 地球科学学院,浙江 杭州 310058||浙江大学 浙江省资源与环境信息系统重点实验室,浙江 杭州 310058
测绘与仪器
学习索引K-means聚类空间填充曲线空间索引
learned indexK-means clusteringspace filling curvespatial index
《浙江大学学报(理学版)》 2024 (002)
153-161,195 / 10
国家自然科学基金资助项目(42271466);高分综合交通遥感应用示范系统(二期)(07-Y30B30-9001-19/21).
评论