微型电脑应用2018,Vol.34Issue(2):68-71,后插1,5.
基于布隆过滤器的海量数据查询技术的优化与应用
The Query Optimization and Application of BloomFilter for Large Dataset
饶文 1陈旭2
作者信息
- 1. 南京烽火软件科技有限公司,南京210000
- 2. 南京烽火星空通信发展有限公司,南京210000
- 折叠
摘要
Abstract
The theory and application scenarios of Bloom filter is illustrated by an analysis sample of customer behavior data.During the project Bloom filter can be used to search for large dataset effectively at a rapid rate.At the beginning of this paper,in-memory database,like MongoDB,is used to solve that question,with a lookup time complexity of O(1) after default index (_id) is the only one permitted to save the premium accouts.The disadvantage is that the functionality needed is limited and the pressure brought by concurrent (one to multiple) query becomes bigger as the valume of data increses.Then the accounts can be read into momery througth appropriate data structure using distributed cache.The mode of data access is changed into one-to-one,resulting in the bigger usage of memory.With a small amount of data to be processed,the performace of HashSet is acceptable because of its convience and speed.As the volume of data increases,Heap memory may overflow.Then,a custom data structure is adopted for the Bloom filter.The basic theory and false positive rate are analyzed,the error data (False Positive Error),reduced by Bloom Filter,can be eliminated.Theory analysis and experiment show that the features of low space usage and high search efficiency for Bloom filter are appropriate to solve this problem.关键词
MapReduce/布隆过滤器/数据集/MongoDBKey words
MapReduce/Bloom filter/Dataset/Mongo DB/Hash table分类
信息技术与安全科学引用本文复制引用
饶文,陈旭..基于布隆过滤器的海量数据查询技术的优化与应用[J].微型电脑应用,2018,34(2):68-71,后插1,5.