SSHGCN:基于音形异构图卷积的中文纠错方法OA北大核心CSTPCD
SSHGCN:A Chinese Error Correction Method Based on Heterogeneous Graph Convolution with Phonological and Visual Features
中文拼写纠错旨在检测和纠正中文文本的拼写错误,现有方法已尝试将字符相似性建模成图结构信息.但目前方法的图结构忽略汉字之间的深层音近关系,并缺少充分发挥字音和字形作用的多模态信息融合方法.因此,本文根据汉字的声母韵母信息和拼音的重要度得到拼音相似关系,结合汉字形近关系来构建汉字相似拼音-形近异构图.在该图上使用异构图卷积来互补使用汉字的音形信息,充分融合汉字的声韵和形状信息.该方法在SIGHAN15(Special Interest Group on Chinese Language Processing 15)基准上句子纠正级的F1值超过所有的对比方法,并在SIGHAN13基准上媲美最优的对比方法,验证了该方法的有效性.
Chinese spelling correction aims to detect and correct spelling errors in Chinese text.Existing methods have attempted to model character similarity as graph structure information.However,the graph structure of current methods ignores the deep phonetic proximity among Chinese characters and lacks a multimodal information fusion method that fully exploits the role of character sound and shape.Therefore,this paper obtains the phonetic similarity relationship based on the initial and final information of Chi-nese characters and the importance of pinyin,and combines the shape proximity relationship of Chinese characters to construct a Chinese character similar pinyin-shape proximity heterogeneous graph.The heterogeneous graph convolution is used on this graph to complement the use of the sound and shape information of Chinese characters,and fully integrate the tone and shape information of Chinese characters.This method surpasses all comparison methods in terms of sentence-level F1 score on the SIGHAN15 bench-mark,and is comparable to the best comparison method on the SIGHAN13 benchmark,verifying the effectiveness of this method.
任俊;黄瑞章
贵州大学 文本计算与认知智能教育部工程研究中心,贵州 贵阳 550025||贵州大学 公共大数据国家重点实验室,贵州 贵阳 550025||贵州大学 计算机科学与技术学院,贵州 贵阳 550025
计算机与自动化
中文拼写纠错多模态信息融合方法字符相似性拼音相似关系
Chinese spelling correctionmultimodal information fusion methodcharacter similaritypinyin similarity
《山西大学学报(自然科学版)》 2024 (003)
518-527 / 10
国家自然科学基金(62066007);贵州省科技支撑计划项目(2022277)
评论