| 注册
首页|期刊导航|南京大学学报(自然科学版)|基于语境与文本结构融合的中文拼写纠错方法

基于语境与文本结构融合的中文拼写纠错方法

刘昌春 张凯 包美凯 刘烨 刘淇

南京大学学报(自然科学版)2024,Vol.60Issue(3):451-463,13.
南京大学学报(自然科学版)2024,Vol.60Issue(3):451-463,13.DOI:10.13232/j.cnki.jnju.2024.03.009

基于语境与文本结构融合的中文拼写纠错方法

Research on Chinese spelling correction based on the integration of context and text structure

刘昌春 1张凯 2包美凯 3刘烨 3刘淇2

作者信息

  • 1. 中国科学技术大学计算机科学与技术学院,合肥,230027
  • 2. 中国科学技术大学计算机科学与技术学院,合肥,230027||中国科学技术大学大数据学院,合肥,230027
  • 3. 中国科学技术大学大数据学院,合肥,230027
  • 折叠

摘要

Abstract

In Chinese Spelling Correction(CSC)tasks,there are often problems such as insufficient semantic understanding of sentences and less use of phonetic and visual information of Chinese characters.Addressing these issues,we propose a novel error correction method based on context confidence and Chinese character similarity for Chinese spelling error correction(ECS).Based on deep learning principles,this approach integrates visual similarity of Chinese characters,and phonetic similarity of Chinese characters,and a fine-tuned pre-trained BERT model,which automatically extracts sentence semantics and exploits the similarity of Chinese characters.Specifically,we fine-tune the pre-trained Chinese BERT model to adapt to downstream Chinese spelling correction tasks.Then,we use the ideographic description sequence to capture the tree structure of Chinese characters as visual information and the phonetic sequence of Chinese characters as phonetic information.Finally,combining the visual and phonetic similarity(calculated by Levenshtein distance)of Chinese characters with the fine-tuned BERT model,we achieve the completion of the correction task.Experimental results on SIGHAN benchmark datasets show that the proposed ECS method has a huge improvement in F1-score compared with the baseline model,which is 2.1%higher on the error detection level and 2.8%higher on the error correction level,verifying the applicability of the fusion of context information,visual information and phonetic information for Chinese spelling correction tasks.

关键词

中文拼写纠错/BERT/汉字语音相似度/汉字视觉相似度/预训练模型

Key words

Chinese spelling correction/BERT/phonological similarity of Chinese characters/visual similarity of Chinese characters/pretrained model

分类

信息技术与安全科学

引用本文复制引用

刘昌春,张凯,包美凯,刘烨,刘淇..基于语境与文本结构融合的中文拼写纠错方法[J].南京大学学报(自然科学版),2024,60(3):451-463,13.

基金项目

国家重点研发计划(2021YFF0901003) (2021YFF0901003)

南京大学学报(自然科学版)

OA北大核心CSTPCD

0469-5097

访问量8
|
下载量0
段落导航相关论文