首页|期刊导航|山西大学学报（自然科学版）|基于实例的词性标注数据错误检测

基于实例的词性标注数据错误检测

崔秀莲严福康李正华

山西大学学报（自然科学版）2024，Vol.47Issue(2)：251-259,9.

山西大学学报（自然科学版）2024，Vol.47Issue(2)：251-259,9.DOI:10.13451/j.sxu.ns.2023166

基于实例的词性标注数据错误检测

Instance-Based Error Detection for Part-of-Speech Tagging Dataset

崔秀莲 ¹严福康 ¹李正华¹

作者信息

1. 苏州大学计算机科学与技术学院,江苏苏州 215000
折叠

摘要

Abstract

Due to the lack of interpretability in deep learning frameworks,in this paper,we apply instance-based methods to error de-tection for part-of-speech tagging dataset for the first time aiming to leverage the similarity information learned between instances.Firstly,we implements an instance-based part-of-speech tagging model based on a pre-trained language model,achieving compara-ble prediction accuracy reaching 96.76％to models based on standard classifiers on the CTB7 dataset.Furthermore,we propose an instance-based annotation error detection method.To obtain an actual error detection dataset,several methods are employed to auto-matically detect errors in the CTB7 test set,and candidate errors are manually corrected,resulting in 2 016 annotation errors,ac-counting for approximately 2.5％of the total 80 000+words.Experimental results on the error detection dataset show that the error detection accuracy of the instance based method reaches 41.48％.

关键词

词性分类/标注错误数据集/语义相似度/CTB7数据集

Key words

part-of-speech tagging/error detection dataset/semantic similarity/CTB7 dataset

分类

信息技术与安全科学

引用本文复制引用

崔秀莲,严福康,李正华..基于实例的词性标注数据错误检测[J].山西大学学报（自然科学版）,2024,47(2):251-259,9.

基金项目

国家自然科学基金(62176173) （62176173）

江苏高校优势学科建设工程资助项目（）

山西大学学报（自然科学版）

OA北大核心CSTPCD

ISSN：0253-2395

访问量0

下载量0

段落导航