现代情报2026,Vol.46Issue(4):57-67,11.DOI:10.3969/j.issn.1008-0821.2026.04.005
大语言模型驱动的北洋政府文书资源知识图谱构建研究
Research on the Construction of Beiyang Government Document Resources Knowledge Graph Driven by Large Language Models
摘要
Abstract
[Purpose/Significance]This paper employs a knowledge graph built using a large language model to address the problem of intelligent development and application of the Beiyang Government Document Resources,transforming fragmented and isolated historical documents into a deep semantic network system with the goal of advancing intelligent historical research and public historical transmission.[Method/Process]This study designed a framework for constructing a knowledge graph of Beiyang Government Document Resources driven by large language models.It relied on the KGGen knowledge graph generation model,integrating the entire process of knowledge representation modeling,entity-relationship extraction,and knowledge graph generation.Initially,data preprocessing was carried out,and a data collec-tion and preprocessing workflow covering structured,semi-structured,and unstructured texts was designed.Combined with the requirements of the large language model task,corpus cleaning,word segmentation analysis,and data annotation were completed,thereby forming a standardized corpus of Beiyang Government Documents Resources in the field.Subse-quently,this paper designed a knowledge representation model of Beiyang Government Document Resources for the large language models extraction task.It summarized category labels including institutions,individuals,positions,decrees,documents,locations,and events,as well as sixty relationship labels covering appointments,nominations,succession,removal,resignation,and leadership relationships.The paper conducted ablation experiments and used accuracy,recall,and F1 as evaluation metrics.The experimental results showed that the framework proposed in this paper performed best in the knowledge extraction task on Beiyang Government Document Resources,largely because the paper accurately anno-tated entities and relationships in the preprocessing stage and applied constraints from the knowledge representation model in the extraction stage.In the concluding phase,the KGGen model was deployed to construct the knowledge graph of Bei-yang Government Document Resources.Thereafter,visual analysis was conducted based on the constructed knowledge graph,and intelligent question-answering services were provided.[Result/Conclusion]Experimental results show that in the two tasks of entity recognition and relationship extraction,the KGGen model outperforms the comparison models in all evaluation indicators.This framework effectively reveals the inherent knowledge structure of Beiyang Government Docu-ment Resources,constructs a high-quality systematic knowledge representation,and provides reusable and transferable methodological references for the mapping and construction of low-resource modern historical documents.关键词
大语言模型/北洋政府文书资源/知识图谱/KGGen模型/知识抽取Key words
large language models/beiyang government document resources/knowledge graph/KGGen/knowledge extraction分类
社会科学引用本文复制引用
邓君,张子姝,潘禹兵,叶东宇,常严予..大语言模型驱动的北洋政府文书资源知识图谱构建研究[J].现代情报,2026,46(4):57-67,11.基金项目
国家社会科学基金重点项目"国家文化数字化战略下档案数据资源挖掘与智慧服务研究"(项目编号:23ATQ001). (项目编号:23ATQ001)