|国家科技期刊平台
首页|期刊导航|现代情报|基于多粒度标签扰动的文本分类研究

基于多粒度标签扰动的文本分类研究OACHSSCDCSTPCD

Text Classification Based on Multi-granularity Label Perturbation

中文摘要英文摘要

[目的/意义]基于深度学习的有监督学习算法是当前文本分类主要的研究方法.然而,有监督的深度学习算法的训练严重依赖于样本标签的准确性,由于标注者的标注经验和主观性等原因,样本标签不可避免地会存在噪声.标签扰动是一种有效应对噪声标签的思路,但当前基于标签扰动的噪声标签学习算法缺乏对多种粒度信息的有效利用,从而限制了算法的性能.[方法/过程]为解决这一问题,本文提出了一种多粒度标签扰动算法(Multi-granularity Label Perturbation,MGLP),融合了样本级粒度和类别级粒度的扰动方式,并利用元学习的思想对不同粒度扰动方式的融合权重进行学习,使该算法能够根据不同的数据特点自适应地对融合权重进行调整.[结果/结论]本文在推文情感分类、电影评论情感分类、引文意图分类 3 个文本分类数据集上展开实验,结果表明MGLP算法有效地提升了深度学习模型在文本分类任务上的性能,在信息组织和信息分析中具有广泛的应用前景.

[Purpose/Significance]Supervised learning algorithms based on deep learning are currently the main re-search methods for text classification.However,the training of supervised deep learning algorithms heavily relies on the ac-curacy of the sample labels.Due to the annotator's experience and subjectivity,sample labels inevitably contain noise.La-bel perturbation is an effective way to deal with noisy labels.However,noisy label learning algorithms based on label per-turbation lack effective utilization of multiple granularity information at present,which limits the performance of the algo-rithms.[Method/Process]In order to address the problem,the paper proposed a multi-granularity label perturbation al-gorithm(MGLP),which combined sample-level granularity and category-level granularity perturbation methods.The MGLP algorithm used the idea of meta-learning to learn the fusion weights of different granularity perturbation methods,which could adaptively adjust the fusion weights according to different data characteristics.[Result/Conclusion]The paper conducts experiment on three text classification datasets,including tweet sentiment classification,movie review sentiment classification,and citation intent classification.The results show that the proposed MGLP algorithm effectively improves the performance of deep learning models in text classification tasks and has broad application prospects in information organiza-tion and information analysis.

姚汝婧;王芳

南开大学商学院信息资源管理系, 天津 300071||南开大学网络社会治理研究中心, 天津 300071

计算机与自动化

文本分类深度学习标签扰动元学习多粒度

text classificationdeep learninglabel perturbationmeta-learningmulti-granularity

《现代情报》 2024 (001)

25-36 / 12

国家社会科学基金重大项目"基于数据共享与知识复用的数字政府智能化治理研究"(项目编号:20ZDA039).

10.3969/j.issn.1008-0821.2024.01.003

评论