计算机技术与发展Issue(12):109-113,5.DOI:10.3969/j.issn.1673-629X.2014.12
大规模非结构化数据的索引技术研究
Study on Large-scale Unstructured Data Indexing Technology
摘要
Abstract
To solve the problem that in large-scale data condition the ASPSeek search engine retrievals inefficiently,has large disk space occupancy and can’t be conducive to update,propose an inverted index-organized technique based on block storage,and make a per-formance comparison research test between external memory based B+tree index and linear hash index.Test results show that,for queries per million data-consuming linear hashing to B+tree index is 57.40%,for inserting per million data-consuming linear hash is 2.44 times to B+tree index,for deleting every million data-consuming linear hash to B+tree index is 83.52%,linear hash index file size is 109.56% of B+tree index file size.According to the test results,B+tree index has the faster index building and updating speed,while linear hash index has the higher disk space occupancy rates and better query performance.关键词
大规模数据/倒排索引/分块式存储/线性散列/B+树Key words
large-scale data/inverted index/block storage/linear hash/B+tree分类
信息技术与安全科学引用本文复制引用
时亚南,张太红,陈燕红,郭斌..大规模非结构化数据的索引技术研究[J].计算机技术与发展,2014,(12):109-113,5.基金项目
新疆自治区高校科研计划项目(XJEDU2013S13);新疆维吾尔自治区科技攻关项目(200931103);新疆农业大学前期资助课题 ()