情报杂志2026,Vol.45Issue(1):75-82,8.DOI:10.3969/j.issn.1002-1965.2026.01.010
基于RoBERTa-MTL融合语言特征的有害文本识别
Toxic Text Detection Based on RoBERTa-MTL for Integrating Linguistic Features
摘要
Abstract
[Purpose]To address the limitations of traditional text recognition models in handling the diverse and subtle nature of toxic con-tent on social media,we aim to develop more precise and efficient detection methods.This will improve the accuracy and generalizability of toxic content identification,thereby fostering a healthier and safer online environment.[Method]This study proposes a method based on RoBERTa and multi-task joint learning,which utilizes RoBERTa to extract text embeddings,constructs a shared encoder and multiple task-specific encoders to capture general and task-specific features respectively,and integrates these two types of features to generate the final representation of the text.[Result/Conclusion]The experimental results demonstrate that the multi-task model improves accuracy,precision,and recall by approximately 10%compared to traditional text classification methods.Furthermore,in contrast to traditional sin-gle-task toxic content detection methods,the multi-task model can leverage the relationships between different types of toxic content,thereby improving the overall performance of toxic content detection.关键词
有害文本/有害言论识别/多任务模型/RoBERTa/BiLSTMKey words
toxic text/toxic comment classification/multi-task learning/RoBERTa/BiLSTM分类
社会科学引用本文复制引用
张新生,张颢泷,马玉龙,王润周..基于RoBERTa-MTL融合语言特征的有害文本识别[J].情报杂志,2026,45(1):75-82,8.基金项目
教育部人文社会科学规划基金项目"泛在信息社会下AI生成式虚假信息风险感知及治理路径研究"(编号:24YJA630129) (编号:24YJA630129)
陕西省社会科学基金年度项目"AIGC时代下生成式虚假信息风险感知及治理路径研究"(编号:2024R055) (编号:2024R055)
陕西省自然科学基础研究计划项目"AIGC背景下虚假信息演化、识别及治理研究"(编号:2025JC-YBMS-1100)研究成果. (编号:2025JC-YBMS-1100)