| 注册

中文文本去毒任务的研究

刘江盛 左家莉 胡玉婷 万剑怡 王明文

山西大学学报(自然科学版)2024,Vol.47Issue(3):528-538,11.
山西大学学报(自然科学版)2024,Vol.47Issue(3):528-538,11.DOI:10.13451/j.sxu.ns.2024001

中文文本去毒任务的研究

Research on Detoxification Task of Chinese Texts

刘江盛 1左家莉 1胡玉婷 1万剑怡 1王明文1

作者信息

  • 1. 江西师范大学 计算机信息工程学院,江西 南昌 330022
  • 折叠

摘要

Abstract

The purpose of this paper was to study how to effectively remove the toxicity of Chinese texts.For this task,this paper re-constructed a Chinese texts toxicity corpus set,which was used as the data basis for task research.Based on this data set,this paper explored the toxic manifestations of texts,and analyzed the causes of specific types of toxic texts.Based on the analysis results above,this paper used two types of text style transfer models based on editing and generating to remove text toxicity,and further ex-plored the performance of removing text toxicity based on different Prompts in large language models.According to the experimen-tal results,the edited model can effectively remove the toxicity of explicit toxic text,and has a higher degree of content preservation,while the generated text has a higher degree of fluency.Prompt-based large language model can remove sentence toxicity to a certain extent,but compared with specific style transfer models,the detoxification ability of small parameter large language model needs to be improved.

关键词

文本风格迁移/文本去毒/大语言模型

Key words

text style transfer/text detoxification/large language model

分类

信息技术与安全科学

引用本文复制引用

刘江盛,左家莉,胡玉婷,万剑怡,王明文..中文文本去毒任务的研究[J].山西大学学报(自然科学版),2024,47(3):528-538,11.

基金项目

国家自然科学基金(61866018) (61866018)

山西大学学报(自然科学版)

OA北大核心CSTPCD

0253-2395

访问量1
|
下载量0
段落导航相关论文