首页|期刊导航|工程科学与技术|基于双路DCGAN数据生成和分类‒回归网络的抑郁症检测

基于双路DCGAN数据生成和分类‒回归网络的抑郁症检测

卢静雪李鸿燕郑睿超秦睿臻

工程科学与技术2026，Vol.58Issue(2)：57-68,12.

工程科学与技术2026，Vol.58Issue(2)：57-68,12.DOI:10.12454/j.jsuese.202400537

基于双路DCGAN数据生成和分类‒回归网络的抑郁症检测

Depression Detection Based on Dual‒Path DCGAN Data Generation and Classification‒Regression Network

卢静雪 ¹李鸿燕 ¹郑睿超 ¹秦睿臻¹

作者信息

1. 太原理工大学电子信息与光学工程学院,山西太原 030024
折叠

摘要

Abstract

Objective Accurate evaluation of depression scores in patients with depression provides effective support for clinical auxiliary diagnosis and en-ables the development of personalized diagnosis and treatment plans,improving the overall accuracy of clinical diagnosis and intervention and contributing significantly to patient health outcomes.Existing research on voice-based depression detection exhibits several limitations,including complex feature extraction processes,single-mode data augmentation,and uncontrollable prediction bias in regression-based estimation.This study proposes a dual‒path DCGAN for data generation and introduces a classification-regression network model for depression score prediction,enabling effective auxiliary diagnosis of depression severity. Methods Firstly,based on the audio characteristics of depressed patients,six types of emotional features were selected from existing speech fea-tures,and corresponding two-dimensional feature maps were constructed for each audio signal sample.For MFCC features,the Teager energy op-erator was fused with MFCC to form MFCC‒TEO features,which further highlighted differences in energy distribution.In addition,the dual‒path deep convolutional generative adversarial network proposed in this study was utilized to enhance the two-dimensional feature maps of each depression level to expand the dataset,increase feature map diversity,and improve model robustness and generalization.Simultaneously,an evaluation index based on spatial and frequency domain characteristics was proposed to screen generated feature maps and retain high-quality samples.Finally,a classification regression network was introduced into the prediction framework to reduce prediction bias by narrowing the pre-diction confidence interval.For residual networks within the classification framework,multi-scale convolution was introduced to enhance infor-mation interaction among features,which enabled the residual network to fully perceive multi-level information contained in the feature maps. Results and Discussions Feature validity tests were conducted for the six selected emotional features,in which MFCC,MFCC‒TEO,LPCC,and Jitter features were sequentially added based on short-term energy,zero-crossing rate,and sound intensity,and accuracy(Acc),root mean square error(RMSE),and mean absolute error(MAE)under different input configurations were calculated.Experimental results showed that Acc,RMSE,and MAE were 89.76%,6.17,and 2.08,respectively,when MFCC was added.When MFCC‒TEO was added,Acc,RMSE,and MAE reached 92.07%,5.49,and 1.58,respectively.When MFCC‒TEO and LPCC were added,Acc,RMSE,and MAE further improved to 93.41%,5.09,and 1.39,respectively.When MFCC‒TEO,LPCC,and Jitter were added,Acc,RMSE,and MAE reached 94.73%,4.55,and 1.11,respec-tively.These results demonstrated that when MFCC‒TEO was used as an input feature,Acc increased by 2.31 percentage points,while RMSE and MAE decreased by 0.68 and 0.50,respectively,compared to using MFCC alone,which indicated that combining MFCC with TEO enhanced the representation of energy distribution differences.The MFCC‒TEO coefficient exhibited stronger depression characterization capability than the MFCC coefficient.Subsequent incorporation of LPCC and Jitter features further improved prediction accuracy to a certain extent.In the data enhancement experiments,when the original dataset was utilized to predict depression scores,Acc,RMSE,and MAE were 80.51%,8.47,and 3.94,respectively.After data enhancement using the dual deep convolutional generative adversarial network,Acc,RMSE,and MAE improved to 94.73%,4.55,and 1.11,respectively.Compared to the original dataset,prediction accuracy significantly improved,with Acc increasing by 14.22 percentage points,and RMSE and MAE decreasing by 3.92 and 2.83,respectively,which demonstrated that DP‒DCGAN-based data enhance-ment effectively expanded the dataset.In the prediction network,the classification accuracy of the original ResNet was 93.28%,while the MSC‒ResNet achieved a classification accuracy of 94.73%,representing an improvement of 1.45 percentage points.These results confirmed that the multi-scale convolution strategy extracted richer global and contextual information,after which the residual network captured detailed informa-tion,enabling the network to fully perceive multi-scale characteristics within the input feature maps and ultimately improve overall model performance. Conclusions This study proposes a depression diagnosis model based on a deep generation network and a classification regression framework.The MFCC‒TEO feature is obtained by introducing TEO into MFCC,and six features,including TEO,are extracted to construct a two-dimensional feature map incorporating time‒frequency,linear,and nonlinear properties.Feature maps corresponding to each depression score in the original dataset are enhanced to increase feature diversity,and evaluation indicators are proposed to screen high-quality feature maps from both spatial and frequency domain perspectives by constructing a DP‒DCGAN network.High-quality and diversified feature maps significantly improve the overall performance of the model.Finally,the proposed MRVN classification regression network is applied to predict depression scores.A multi-scale convolution module is added to the ResNet classification network to address the limitation of single-scale receptive fields in feature extraction by integrating the unique characteristics of the feature maps proposed in this study.In addition,the input data can be predicted on a more uniform scale by combining classification and regression strategies,reducing large prediction deviations commonly observed in regres-sion tasks.

关键词

语音/抑郁症检测/生成对抗网络/分类‒回归/残差网络

Key words

audio/depression detection/generative adversarial network/category‒regression/residual network

分类

信息技术与安全科学

引用本文复制引用

卢静雪,李鸿燕,郑睿超,秦睿臻..基于双路DCGAN数据生成和分类‒回归网络的抑郁症检测[J].工程科学与技术,2026,58(2):57-68,12.

基金项目

国家自然科学基金项目(62201377) （62201377）

山西省回国留学人员科研资助项目(2022‒072) （2022‒072）

工程科学与技术

ISSN：2096-3246

访问量0

下载量0

段落导航