| 注册
首页|期刊导航|中北大学学报(自然科学版)|基于自监督预训练模型和NWCE的口吃语音分类

基于自监督预训练模型和NWCE的口吃语音分类

殷志鹏 徐新洲

中北大学学报(自然科学版)2025,Vol.46Issue(1):19-26,8.
中北大学学报(自然科学版)2025,Vol.46Issue(1):19-26,8.DOI:10.62756/jnuc.issn.1673-3193.2023.09.0002

基于自监督预训练模型和NWCE的口吃语音分类

Stuttering Speech Classification Based on Self-Supervised Pre-Trained Model and NWCE

殷志鹏 1徐新洲1

作者信息

  • 1. 南京邮电大学 物联网学院,江苏 南京 210003
  • 折叠

摘要

Abstract

Stuttering speech classification aims to classify and recognize different categories of stuttering using spoken signals.Nevertheless,the existing related works fail to sufficiently focus on sequential char-acteristics for the representation embedding of self-supervised pre-trained models,and these works also simplistically address the class-imbalance issue for stuttering-speech data.In this regard,we proposed a stuttering speech classification approach based on self-supervised pre-trained models and nonlinear weighted cross-entropy(NWCE)loss.Within the proposed approach,we first employed a self-supervised pre-trained model to extract paralinguistic representation embeddings from stuttering speech.Then,we utilized a bidirectional long short-term memory network model with a self-attention mechanism to capture essential temporal features and contextual information within the embeddings.Afterwards,a nonlinear weighted cross-entropy loss was performed to focus on stuttering speech categories with fewer samples.The experimental results on stuttering speech classification dataset indicate that,the proposed approach achieves better performance for classifying stuttering speech compared with state-of-the-art approaches,through learning the sequential information from self-supervised pre-trained models'multi-layer representation embedding in speech,and sufficiently describes the relationship between the data of different stuttering categories by using NWCE.

关键词

计算副语言/口吃语音分类/自监督预训练模型/非线性加权交叉熵损失

Key words

computational paralinguistics/stuttering speech classification/self-supervised pre-trained model/nonlinear weighted cross-entropy loss

分类

信息技术与安全科学

引用本文复制引用

殷志鹏,徐新洲..基于自监督预训练模型和NWCE的口吃语音分类[J].中北大学学报(自然科学版),2025,46(1):19-26,8.

基金项目

中国博士后科学基金面上项目(2022M711693) (2022M711693)

国家自然科学基金面上项目(62071242,62172235) (62071242,62172235)

南京邮电大学校级自然科学基金(NY222158) (NY222158)

中北大学学报(自然科学版)

1673-3193

访问量0
|
下载量0
段落导航相关论文