现代情报2024,Vol.44Issue(1):25-36,12.DOI:10.3969/j.issn.1008-0821.2024.01.003
基于多粒度标签扰动的文本分类研究
Text Classification Based on Multi-granularity Label Perturbation
摘要
Abstract
[Purpose/Significance]Supervised learning algorithms based on deep learning are currently the main re-search methods for text classification.However,the training of supervised deep learning algorithms heavily relies on the ac-curacy of the sample labels.Due to the annotator's experience and subjectivity,sample labels inevitably contain noise.La-bel perturbation is an effective way to deal with noisy labels.However,noisy label learning algorithms based on label per-turbation lack effective utilization of multiple granularity information at present,which limits the performance of the algo-rithms.[Method/Process]In order to address the problem,the paper proposed a multi-granularity label perturbation al-gorithm(MGLP),which combined sample-level granularity and category-level granularity perturbation methods.The MGLP algorithm used the idea of meta-learning to learn the fusion weights of different granularity perturbation methods,which could adaptively adjust the fusion weights according to different data characteristics.[Result/Conclusion]The paper conducts experiment on three text classification datasets,including tweet sentiment classification,movie review sentiment classification,and citation intent classification.The results show that the proposed MGLP algorithm effectively improves the performance of deep learning models in text classification tasks and has broad application prospects in information organiza-tion and information analysis.关键词
文本分类/深度学习/标签扰动/元学习/多粒度Key words
text classification/deep learning/label perturbation/meta-learning/multi-granularity分类
信息技术与安全科学引用本文复制引用
姚汝婧,王芳..基于多粒度标签扰动的文本分类研究[J].现代情报,2024,44(1):25-36,12.基金项目
国家社会科学基金重大项目"基于数据共享与知识复用的数字政府智能化治理研究"(项目编号:20ZDA039). (项目编号:20ZDA039)