南京师大学报(自然科学版)2026,Vol.49Issue(2):74-84,11.DOI:10.3969/j.issn.1001-4616.2026.02.008
融合双语信息的汉语篇章主次识别方法
Integration of Bilingual Information for Nuclearity Recognition in Chinese Discourse
摘要
Abstract
Chinese nuclearity recognition encounters inherent difficulties owing to limited explicit inter-sentential connectives.In contrast,English systematically marks nuclearity through subordinate constructions and discourse markers.Current approaches trained models exclusively on Chinese corpora without leveraging English signals.Our methodology addresses this gap by incorporating parallel bilingual training data.A multilingual pre-trained model processed the bilingual texts,and a multi-head attention mechanism captured explicit and implicit nuclearity features.Experiments on the Chinese Discourse Treebank(CDTB)showed that our model achieved 8.7%and 6.1%improvements in Macro-F1 and Micro-F1 scores over the previous state-of-the-art GMN-Nu model.Compared to monolingual training with mBERT,mT5,and XLM-R,the bilingual fusion strategy increased Micro-F1 by 1.6%,3.5%and 1.3%,respectively.Additional tests on the Chinese-English Discourse Treebank(CEDT)demonstrated 10.2%and 5.8%gains in Micro-F1 and Macro-F1 over monolingual methods.关键词
篇章分析/主次识别/预训练模型/双语信息Key words
discourse analysis/nuclearity recognition/pretrained models/bilingual information分类
信息技术与安全科学引用本文复制引用
李艳翠,郭鹏程,苗国义..融合双语信息的汉语篇章主次识别方法[J].南京师大学报(自然科学版),2026,49(2):74-84,11.基金项目
教育部人文社会科学研究项目(22YJCZH091)、河南省科技攻关项目(252102210102、262102210084)、河南省自然科学基金项目(262300421797). (22YJCZH091)