南京理工大学学报(自然科学版)Issue(3):260-265,6.DOI:10.14177/j.cnki.32-1397n.2015.39.03.002
大数据环境下Lucene性能优化方法研究
Performance optimization method of Lucene in big data
马旸 1蔡冰1
作者信息
- 1. 国家计算机网络应急技术处理协调中心江苏分中心,江苏 南京210003
- 折叠
摘要
Abstract
To improve the data query efficiency in big data, an optimized inverted index method—RAM FS directory( RFDirectory) is proposed here based on memory computing and batch processing technique. A post-list management technique combining random access memory(RAM)and disk is realized based on Lucene. New data are written into a cache,and then written into a disk index peri-odically to improve the writing performance of the inverted index method. Data query results are provided efficiently to consumers by integrating the multiple block inverted structure of the disk and RAM. Experimental results show that the index constructing time of RFDirectory is 50% of that of FSDirectory or RAMDirectory,and the time consuming of returning the index result of one keyword is reduced by 15% in big data.关键词
大数据/Lucene/内存计算/批量更新/倒排索引/倒排表/缓存/内存索引/磁盘索引/多分块倒排结构Key words
big data/Lucene/memory computing/batch processing/inverted index/post-list/cache/random access memory index/disk index/multiple block inverted structure分类
信息技术与安全科学引用本文复制引用
马旸,蔡冰..大数据环境下Lucene性能优化方法研究[J].南京理工大学学报(自然科学版),2015,(3):260-265,6.