| 注册
首页|期刊导航|四川大学学报(自然科学版)|一种基于联合检测编辑机制的司法文本智能纠错方法

一种基于联合检测编辑机制的司法文本智能纠错方法

王嘉宝 翁洋 李鑫

四川大学学报(自然科学版)2026,Vol.63Issue(1):208-217,10.
四川大学学报(自然科学版)2026,Vol.63Issue(1):208-217,10.DOI:10.19907/j.0490-6756.250293

一种基于联合检测编辑机制的司法文本智能纠错方法

An intelligent judicial text error correction method based on joint detection and editing mechanism

王嘉宝 1翁洋 1李鑫2

作者信息

  • 1. 四川大学数学学院,成都 610065
  • 2. 四川大学法学院,成都 610065
  • 折叠

摘要

Abstract

In judicial practice,the court transcripts often contain ungrammatical or ambiguous expressions,which,if left unprocessed and directly used for automatic judgment,may lead to misinterpretation by comput-ers and affect the final verdict.This highlights the importance of Chinese grammatical error correction(CGEC).Grammatical error correction is often modeled as a sequence-to-sequence generation task.Genera-tive language models tend to produce fluent but semantically deviated outputs in fine-grained correction,thus limiting their reliability.To address this problem,a text error correction model based on the joint detection and editing mechanism is proposed.In the model,the CGEC is reformulated as a token-level sequence label-ing task and predicting edit operations for each token to achieve precise corrections.A rich label set that inte-grates basic editing operations with five types of conversion labels targeting common Chinese grammatical er-rors is designed,where each label explicitly encodes the error type and the target token,thus providing inter-pretable correction results.To improve the training efficiency and prediction accuracy,consecutive identical operations at the character level are merged into one single composite label.Moreover,a joint training strat-egy is proposed by adopting a joint loss to simultaneously optimize error detection and type classification,en-hancing model robustness and semantic fidelity.Under this framework,multiple pre-trained models are fine-tuned.Finally,experimental results show that the model can consistently outperform the generative large lan-guage models across all metrics while significantly improving inference speed.Analysis of errors generated by large models further demonstrates that the sequence labeling approach maintains semantic fidelity while more robustly handling common Chinese grammatical errors.

关键词

中文语法纠错/序列标注/联合训练/生成式大模型

Key words

Chinese grammatical error correction/sequence labeling/joint training/generative large lan-guage models

分类

数理科学

引用本文复制引用

王嘉宝,翁洋,李鑫..一种基于联合检测编辑机制的司法文本智能纠错方法[J].四川大学学报(自然科学版),2026,63(1):208-217,10.

基金项目

四川省重点研发项目(2024YFFK0113) (2024YFFK0113)

四川大学学报(自然科学版)

0490-6756

访问量0
|
下载量0
段落导航相关论文