现代情报2025,Vol.45Issue(10):26-38,13.DOI:10.3969/j.issn.1008-0821.2025.10.003
基于生成式大语言模型的非遗文本嵌套命名实体识别研究
Research on Nested Named Entity Recognition of Intangible Cultural Heritage Texts Based on Generative Language Models
摘要
Abstract
[Purpose/Significance]This study explores the utilization of generative large language models(LLMs)in nested named entity recognition(NER)specifically for Chinese intangible cultural heritage(ICH)texts,aiming to improve the accuracy of identifying hierarchical entities within complex,domain-specific materials.[Method/Process]The study evaluated the performance of generative LLMs,including GPT-4,Claude 3.5 Sonnet,and ChatGLM2-6b,against the BERT+GlobalPointer baseline model.Two prompt engineering techniques,chain-of-thought reasoning and behavioral rea-soning,were designed to enhance the models'capabilities in recognizing entities within complex contextual environments.[Result/Conclusion]The GPT-4 model exhibited optimal performance under the behavioral reasoning mode,while the Qwen2-72B model achieved a peak F1 score of 91.16%,demonstrating exceptional adaptability to domain-specific tasks.The results confirm the effectiveness of generative large language models(LLMs)for nested entity recognition in ICH docu-ments,while also emphasizing challenges including important computational demands and reduced inference speeds when handling lengthy texts and complex nested structures.Future research will focus on hybrid models or multitask learning frameworks to combine the stability of BERT models with the flexibility of generative LLMs,aiming to further improve rec-ognition performance.关键词
生成式语言模型/嵌套实体识别/数字人文/非物质文化遗产/文本挖掘Key words
generative language model/nested entity recognition/digital humanities/intangible cultural heritage/text mining分类
社会科学引用本文复制引用
张逸勤,邓三鸿,王东波..基于生成式大语言模型的非遗文本嵌套命名实体识别研究[J].现代情报,2025,45(10):26-38,13.基金项目
江苏省研究生科研创新计划项目(项目编号:KYCX24_0110). (项目编号:KYCX24_0110)