四川大学学报(自然科学版)2026,Vol.63Issue(1):208-217,10.DOI:10.19907/j.0490-6756.250293
一种基于联合检测编辑机制的司法文本智能纠错方法
An intelligent judicial text error correction method based on joint detection and editing mechanism
摘要
Abstract
In judicial practice,the court transcripts often contain ungrammatical or ambiguous expressions,which,if left unprocessed and directly used for automatic judgment,may lead to misinterpretation by comput-ers and affect the final verdict.This highlights the importance of Chinese grammatical error correction(CGEC).Grammatical error correction is often modeled as a sequence-to-sequence generation task.Genera-tive language models tend to produce fluent but semantically deviated outputs in fine-grained correction,thus limiting their reliability.To address this problem,a text error correction model based on the joint detection and editing mechanism is proposed.In the model,the CGEC is reformulated as a token-level sequence label-ing task and predicting edit operations for each token to achieve precise corrections.A rich label set that inte-grates basic editing operations with five types of conversion labels targeting common Chinese grammatical er-rors is designed,where each label explicitly encodes the error type and the target token,thus providing inter-pretable correction results.To improve the training efficiency and prediction accuracy,consecutive identical operations at the character level are merged into one single composite label.Moreover,a joint training strat-egy is proposed by adopting a joint loss to simultaneously optimize error detection and type classification,en-hancing model robustness and semantic fidelity.Under this framework,multiple pre-trained models are fine-tuned.Finally,experimental results show that the model can consistently outperform the generative large lan-guage models across all metrics while significantly improving inference speed.Analysis of errors generated by large models further demonstrates that the sequence labeling approach maintains semantic fidelity while more robustly handling common Chinese grammatical errors.关键词
中文语法纠错/序列标注/联合训练/生成式大模型Key words
Chinese grammatical error correction/sequence labeling/joint training/generative large lan-guage models分类
数理科学引用本文复制引用
王嘉宝,翁洋,李鑫..一种基于联合检测编辑机制的司法文本智能纠错方法[J].四川大学学报(自然科学版),2026,63(1):208-217,10.基金项目
四川省重点研发项目(2024YFFK0113) (2024YFFK0113)