现代情报2024,Vol.44Issue(2):107-114,129,9.DOI:10.3969/j.issn.1008-0821.2024.02.009
面向科技文献多维语义组织的混合倒排索引构建方法
Hybrid Inverted Index Construction Method for Multidimensional Semantic Organization of Scientific and Technical Literature
摘要
Abstract
[Purpose/Significance]In order to meet the urgent needs of researchers for efficient querying of fine-grained semantic information within scientific and technological literature,previous studies have proposed a multidimension-al semantic indexing system for scientific and technological literature,however,the common inverted indexes based on HashMap lead to inefficient querying.This paper aims to improve the semantic query performance by establishing hybrid in-verted indexes for different dimensional semantic features.[Method/Process]This paper explored the inverted index con-struction methods suitable for different semantic dimensions with Treap,B+tree and other data structures,and combined them to form a variety of hybrid inverted index construction methods suitable for multidimensional semantic organization of scientific and technological literature,and analyzed and verified the query performance of the different types of inverted in-dex construction methods under the conditions of Top-k query and Boolean query through comparative experiments.[Re-sult/Conclusion]The experimental results show that among the eight hybrid inverted index construction methods formed by the combination,C3(HHHB)shown in Table 2 is proved to have the highest efficiency under the condition of Top-k que-ry,while C4(TTTB)is proved to be the most efficient under the condition of Boolean query.The method in this paper can effectively solve the query efficiency problem caused by a single index structure.关键词
科技文献/语义组织/混合倒排索引/HashMap/Treap/B+树Key words
scientific and technical literature/semantic organization/hybrid inverted index/hashMap/treap/B+Tree分类
社会科学引用本文复制引用
张敏,李唯,范青..面向科技文献多维语义组织的混合倒排索引构建方法[J].现代情报,2024,44(2):107-114,129,9.基金项目
国家社会科学基金艺术学项目"非物质文化遗产智能传播的内在机理与进阶路径研究"(项目编号:22CH188) (项目编号:22CH188)
国家社会科学基金艺术学项目"非物质文化遗产智能传播的内在机理与进阶路径研究"(项目编号:22CH188) (项目编号:22CH188)
科技大数据湖北省重点实验室开放基金课题资助项目"科学文化传播领域大数据资源开放平台建设"(项目编号:E3KF291001). (项目编号:E3KF291001)