| 注册
首页|期刊导航|科技情报研究|领域大语言模型下的古籍词性标注应用研究

领域大语言模型下的古籍词性标注应用研究

朱丹浩 赵志枭 胡蝶 赵文华 孙光耀 王东波

科技情报研究2024,Vol.6Issue(2):21-29,9.
科技情报研究2024,Vol.6Issue(2):21-29,9.DOI:10.19809/j.cnki.kjqbyj.2024.02.003

领域大语言模型下的古籍词性标注应用研究

Research on the Application of Part-of-speech Tagging of Ancient Books under the Domain Large Language Model

朱丹浩 1赵志枭 2胡蝶 2赵文华 2孙光耀 2王东波2

作者信息

  • 1. 江苏警官学院刑事科学技术系,南京 210031
  • 2. 南京农业大学信息管理学院,南京 210095
  • 折叠

摘要

Abstract

[Purpose/significance]The development of the large language model has brought new ideas for ancient text mining,and combining the large language model with the digitisation and intelligence of ancient books is a necessary path for the work of ancient books in the new era.[Methods/process]This paper uses the lexically annotated corpus of Zuozhuan to construct a batch of high-quality lexically annotated instruction data through data cleaning and preprocessing,on the basis of which 500,1 000,2 000,and 5 000 pieces of data are used to fine-tune the instructions of the large language model,and the performance test is carried out on another 1 000 pieces of data,respectively.[Results/conclusions]The experimental results show that the"Xunzi"series model outperforms the general domain model on the lexical annotation task of ancient texts,and the Xunzi-Baichuan2-7B model exhibits optimal performance with an F1 value of 81.67%when the amount of fine-tuned data reaches 5 000.

关键词

大语言模型/"荀子"大模型/《左传》/词性标注/指令微调

Key words

Large language model/"Xunzi"large language model/Zuozhuan/lexicalannotation/instructiontuning

分类

社会科学

引用本文复制引用

朱丹浩,赵志枭,胡蝶,赵文华,孙光耀,王东波..领域大语言模型下的古籍词性标注应用研究[J].科技情报研究,2024,6(2):21-29,9.

基金项目

2021年国家社科基金重大项目"中国古代典籍跨语言知识库构建及应用研究"(编号:21&ZD331) (编号:21&ZD331)

科技情报研究

OACSSCI

2096-7144

访问量0
|
下载量0
段落导航相关论文