现代情报2025,Vol.45Issue(7):14-25,63,13.DOI:10.3969/j.issn.1008-0821.2025.07.002
融合知识图谱与大语言模型的科技文献复杂知识对象抽取研究
Research on Scientific and Technological Literature Complex Knowledge Object Extraction Fusing Knowledge Graph and Large Language Model
摘要
Abstract
[Purpose/Significance]The complex knowledge objects in scientific and technological literature provide fine-grained and comprehensive knowledge representation of the deep knowledge content in scientific and technological liter-ature,which can effectively support data-driven scientific and knowledge discovery and is an important element of techno-logical innovation.[Method/Process]Firstly,the domain knowledge graph was constructed through steps such as light-weight ontology construction,BRAT knowledge annotation,and Neo4j knowledge storage.Next,the large language model ChatGLM2-6B was locally deployed and fine tuned through LoRA technology.Finally,based on the MOT mechanism,the knowledge in the knowledge graph was injected into the prompts,and complex knowledge objects were extracted from scien-tific literature through multiple rounds of Q&A with the large language model.[Result/Conclusion]Taking organic solar cells(OSCs)as an example to verify the effectiveness of the method,the results show that the extraction method intergrat-ing knowledge graph and large language model is superior to the extraction method supported by large language model alone,with improvements of 14.1%,10.3%,and 12.3%in accuracy P,recall R,and F1 score,respectively.Knowledge graph can enhance the ability of large language models to extract complex knowledge objects from scientific literature,and improve the efficiency and accuracy of scientific literature mining in the OSC field.关键词
知识图谱/大语言模型/科技文献/太阳能电池/知识抽取/提示构建Key words
knowledge graph/large language model/knowledge extraction/prompt building分类
社会科学引用本文复制引用
陈文杰,胡正银,石栖,卢颖..融合知识图谱与大语言模型的科技文献复杂知识对象抽取研究[J].现代情报,2025,45(7):14-25,63,13.基金项目
国家社会科学基金青年项目"基于超网络的关键核心技术识别与技术机会发现研究"(项目编号:24CTQ044) (项目编号:24CTQ044)
国家重点研发计划项目"从生物医学和流行病学研究数据中自动生成因果图的系列工具研发"(项目编号:2022YFF0712000) (项目编号:2022YFF0712000)
中国科学院文献情报能力建设专项"实验方法自动抽取与建模研究"(项目编号:E2C0003008). (项目编号:E2C0003008)