| 注册
首页|期刊导航|现代情报|基于生成式大语言模型的非遗文本嵌套命名实体识别研究

基于生成式大语言模型的非遗文本嵌套命名实体识别研究

张逸勤 邓三鸿 王东波

现代情报2025,Vol.45Issue(10):26-38,13.
现代情报2025,Vol.45Issue(10):26-38,13.DOI:10.3969/j.issn.1008-0821.2025.10.003

基于生成式大语言模型的非遗文本嵌套命名实体识别研究

Research on Nested Named Entity Recognition of Intangible Cultural Heritage Texts Based on Generative Language Models

张逸勤 1邓三鸿 1王东波2

作者信息

  • 1. 南京大学信息管理学院,江苏 南京 210023||数据工程与知识服务省高校重点实验室(南京大学),江苏 南京 210023
  • 2. 南京农业大学信息管理学院,江苏 南京 210095
  • 折叠

摘要

Abstract

[Purpose/Significance]This study explores the utilization of generative large language models(LLMs)in nested named entity recognition(NER)specifically for Chinese intangible cultural heritage(ICH)texts,aiming to improve the accuracy of identifying hierarchical entities within complex,domain-specific materials.[Method/Process]The study evaluated the performance of generative LLMs,including GPT-4,Claude 3.5 Sonnet,and ChatGLM2-6b,against the BERT+GlobalPointer baseline model.Two prompt engineering techniques,chain-of-thought reasoning and behavioral rea-soning,were designed to enhance the models'capabilities in recognizing entities within complex contextual environments.[Result/Conclusion]The GPT-4 model exhibited optimal performance under the behavioral reasoning mode,while the Qwen2-72B model achieved a peak F1 score of 91.16%,demonstrating exceptional adaptability to domain-specific tasks.The results confirm the effectiveness of generative large language models(LLMs)for nested entity recognition in ICH docu-ments,while also emphasizing challenges including important computational demands and reduced inference speeds when handling lengthy texts and complex nested structures.Future research will focus on hybrid models or multitask learning frameworks to combine the stability of BERT models with the flexibility of generative LLMs,aiming to further improve rec-ognition performance.

关键词

生成式语言模型/嵌套实体识别/数字人文/非物质文化遗产/文本挖掘

Key words

generative language model/nested entity recognition/digital humanities/intangible cultural heritage/text mining

分类

社会科学

引用本文复制引用

张逸勤,邓三鸿,王东波..基于生成式大语言模型的非遗文本嵌套命名实体识别研究[J].现代情报,2025,45(10):26-38,13.

基金项目

江苏省研究生科研创新计划项目(项目编号:KYCX24_0110). (项目编号:KYCX24_0110)

现代情报

OA北大核心

1008-0821

访问量1
|
下载量0
段落导航相关论文