基于不平衡社交媒体文本的抑郁症检测方法OACSTPCD
A Detection Method for Depression Based on Imbalanced Social Media Text
针对目前基于社交媒体数据的抑郁症检测模型难以适应不平衡数据和评估指标不全面的问题,提出一种基于文档自适应增强Bagging-τSS3(Document Adaptive Enhanced Bagging-τSS3,DAEB-τSS3)模型的社交媒体文本数据抑郁检测方法和一种新的机器学习评价指标GF(α,β)-Score.在τ-SS3模型基础上引入置信度加权处理,增强少数类数据影响;同时,采用文档自适应增强Bagging方法进行集成学习,改进Bagging的随机采样为分层采样并对少数类数据文档进行自适应增强以提升模型适应不平衡数据的能力;最后在模型评价阶段,使用GF-Score进行自动参数选择,丢弃表现不佳的基学习器,提升模型的可信度和稳定性.在E-Risk2017抑郁症检测数据集上的实验结果表明,DAEB-τSS3有更强的适应不平衡数据集的能力,相较于τSS3、双向长短时记忆网络和ERNIE 3.0等模型有显著性能提升,GF-Score、F1-Score和G-Mean Score平均提升13%,0.7%和26.9%,可以更加有效地实现基于不平衡社交媒体文本的抑郁症检测.
To address the challenges faced by the current depression detection model based on social media data,such as difficulties in handling imbalanced data and incomplete evaluation indicators,we propose a new approach called Document Adaptive Enhanced Bagging-τSS3(DAEB-τSS3).This method utilizes social media text data for depression detection and introduces a novel machine learning evaluation metric called GF(α,β)-Score.Building upon theτ-SS3 model,we incorporate confidence weighting to amplify the influence of certain data types.Additionally,we employ the Bagging method to enhance integrated learning,improving the sampling process from random sampling to layered sampling.This adaptive enhancement focuses on a select number of data documents,thereby improving the model's ability to handle imbalanced data.In the model evaluation stage,-we utilize GF-Score for automatic parameter selection and discard underperforming base learners,thereby enhancing the model's reliability and stability.Experimental results on the E-Risk2017 depression detection dataset demonstrate that DAEB-τSS3 exhibits superior adaptability to imbalanced datasets and outperforms τSS3,bi-directional long-term memory networks,and ERNIE 3.0 models.The average improvements in GF-Score,Fl-Score,and G-Mean Score are 13%,0.7%,and 26.9%,respectively,enabling more effective depression detection based on imbalanced social media texts.
郭耀木;刘鹏;孙源乐;白其炜;张少华;刘建
合肥工业大学计算机与信息学院,安徽 宣城 242000
计算机与自动化
不平衡数据集抑郁检测集成学习文本分类社交媒体文本数据
imbalanced datasetdepression detectionensemble learningtext classificationsocial media text data
《计算机技术与发展》 2024 (004)
153-161 / 9
国家自然科学基金青年基金(JZ2019GJQN0385);安徽省大学生创新训练项目(S202210359346);合肥工业大学大学生创新训练项目(X202310359868)
评论