首页|期刊导航|科技情报研究|领域大语言模型下的古籍词性标注应用研究

领域大语言模型下的古籍词性标注应用研究

朱丹浩赵志枭胡蝶赵文华孙光耀王东波

科技情报研究2024，Vol.6Issue(2)：21-29,9.

科技情报研究2024，Vol.6Issue(2)：21-29,9.DOI:10.19809/j.cnki.kjqbyj.2024.02.003

领域大语言模型下的古籍词性标注应用研究

Research on the Application of Part-of-speech Tagging of Ancient Books under the Domain Large Language Model

朱丹浩 ¹赵志枭 ²胡蝶 ²赵文华 ²孙光耀 ²王东波²

作者信息

1. 江苏警官学院刑事科学技术系,南京 210031
2. 南京农业大学信息管理学院,南京 210095
折叠

摘要

Abstract

[Purpose/significance]The development of the large language model has brought new ideas for ancient text mining,and combining the large language model with the digitisation and intelligence of ancient books is a necessary path for the work of ancient books in the new era.[Methods/process]This paper uses the lexically annotated corpus of Zuozhuan to construct a batch of high-quality lexically annotated instruction data through data cleaning and preprocessing,on the basis of which 500,1 000,2 000,and 5 000 pieces of data are used to fine-tune the instructions of the large language model,and the performance test is carried out on another 1 000 pieces of data,respectively.[Results/conclusions]The experimental results show that the"Xunzi"series model outperforms the general domain model on the lexical annotation task of ancient texts,and the Xunzi-Baichuan2-7B model exhibits optimal performance with an F1 value of 81.67%when the amount of fine-tuned data reaches 5 000.

关键词

大语言模型/"荀子"大模型/《左传》/词性标注/指令微调

Key words

Large language model/"Xunzi"large language model/Zuozhuan/lexicalannotation/instructiontuning

分类

社会科学

引用本文复制引用

朱丹浩,赵志枭,胡蝶,赵文华,孙光耀,王东波..领域大语言模型下的古籍词性标注应用研究[J].科技情报研究,2024,6(2):21-29,9.

基金项目

2021年国家社科基金重大项目"中国古代典籍跨语言知识库构建及应用研究"(编号:21&ZD331) （编号:21&ZD331）

科技情报研究

OACSSCI

ISSN：2096-7144

访问量0

下载量0

段落导航