|国家科技期刊平台
首页|期刊导航|现代情报|面向科技文献多维语义组织的混合倒排索引构建方法

面向科技文献多维语义组织的混合倒排索引构建方法OACHSSCDCSTPCD

Hybrid Inverted Index Construction Method for Multidimensional Semantic Organization of Scientific and Technical Literature

中文摘要英文摘要

[目的 /意义]为满足科研人员对科技文献内部细粒度语义信息进行高效查询的迫切需求,前期研究提出了面向科技文献的多维语义索引体系,然而基于HashMap的常见倒排索引会导致查询效率低下.本文旨在通过面向不同维度语义特征建立混合倒排索引,以改进语义查询性能.[方法/过程]本文以Treap、B+树等多种数据结构探索适合不同语义维度的倒排索引构建方法,并将其组合形成多种适用于科技文献多维语义组织的混合倒排索引构建方法,并通过对比实验,在排序查询和布尔查询条件下分析验证不同类型倒排索引构建方法的查询性能.[结果/结论]实验结果表明,组合形成的 8 种混合倒排索引构建方法中,表 2 所示的C3(HHHB)被证明在排序查询条件下具有最高的效率,而C4(TTTB)则在布尔查询条件下被证明最为高效.本文的方法能有效解决单一索引结构导致的查询效率问题.

[Purpose/Significance]In order to meet the urgent needs of researchers for efficient querying of fine-grained semantic information within scientific and technological literature,previous studies have proposed a multidimension-al semantic indexing system for scientific and technological literature,however,the common inverted indexes based on HashMap lead to inefficient querying.This paper aims to improve the semantic query performance by establishing hybrid in-verted indexes for different dimensional semantic features.[Method/Process]This paper explored the inverted index con-struction methods suitable for different semantic dimensions with Treap,B+tree and other data structures,and combined them to form a variety of hybrid inverted index construction methods suitable for multidimensional semantic organization of scientific and technological literature,and analyzed and verified the query performance of the different types of inverted in-dex construction methods under the conditions of Top-k query and Boolean query through comparative experiments.[Re-sult/Conclusion]The experimental results show that among the eight hybrid inverted index construction methods formed by the combination,C3(HHHB)shown in Table 2 is proved to have the highest efficiency under the condition of Top-k que-ry,while C4(TTTB)is proved to be the most efficient under the condition of Boolean query.The method in this paper can effectively solve the query efficiency problem caused by a single index structure.

张敏;李唯;范青

中国科学院武汉文献情报中心, 湖北 武汉 430071||科技大数据湖北省重点实验室, 湖北 武汉 430071武汉软件工程职业学院(武汉开放大学), 湖北 武汉 430205华中师范大学国家文化产业研究中心, 湖北 武汉 430079

科技文献语义组织混合倒排索引HashMapTreapB+树

scientific and technical literaturesemantic organizationhybrid inverted indexhashMaptreapB+Tree

《现代情报》 2024 (002)

107-114,129 / 9

国家社会科学基金艺术学项目"非物质文化遗产智能传播的内在机理与进阶路径研究"(项目编号:22CH188);国家社会科学基金艺术学项目"非物质文化遗产智能传播的内在机理与进阶路径研究"(项目编号:22CH188);科技大数据湖北省重点实验室开放基金课题资助项目"科学文化传播领域大数据资源开放平台建设"(项目编号:E3KF291001).

10.3969/j.issn.1008-0821.2024.02.009

评论