科技情报研究2024,Vol.6Issue(2):21-29,9.DOI:10.19809/j.cnki.kjqbyj.2024.02.003
领域大语言模型下的古籍词性标注应用研究
Research on the Application of Part-of-speech Tagging of Ancient Books under the Domain Large Language Model
摘要
Abstract
[Purpose/significance]The development of the large language model has brought new ideas for ancient text mining,and combining the large language model with the digitisation and intelligence of ancient books is a necessary path for the work of ancient books in the new era.[Methods/process]This paper uses the lexically annotated corpus of Zuozhuan to construct a batch of high-quality lexically annotated instruction data through data cleaning and preprocessing,on the basis of which 500,1 000,2 000,and 5 000 pieces of data are used to fine-tune the instructions of the large language model,and the performance test is carried out on another 1 000 pieces of data,respectively.[Results/conclusions]The experimental results show that the"Xunzi"series model outperforms the general domain model on the lexical annotation task of ancient texts,and the Xunzi-Baichuan2-7B model exhibits optimal performance with an F1 value of 81.67%when the amount of fine-tuned data reaches 5 000.关键词
大语言模型/"荀子"大模型/《左传》/词性标注/指令微调Key words
Large language model/"Xunzi"large language model/Zuozhuan/lexicalannotation/instructiontuning分类
社会科学引用本文复制引用
朱丹浩,赵志枭,胡蝶,赵文华,孙光耀,王东波..领域大语言模型下的古籍词性标注应用研究[J].科技情报研究,2024,6(2):21-29,9.基金项目
2021年国家社科基金重大项目"中国古代典籍跨语言知识库构建及应用研究"(编号:21&ZD331) (编号:21&ZD331)