计算机应用研究2024,Vol.41Issue(11):3370-3375,6.DOI:10.19734/j.issn.1001-3695.2024.03.0097
基于多尺度距离矩阵的语音关键词检测与细粒度定位方法
Spoken term detection and fine-grained localization method based on multi-scale distance matrices
摘要
Abstract
Aiming to address the low localization accuracy of existing spoken term detection methods,this paper proposed a spoken term detection and fine-grained localization method based on multi-scale distance matrices(MF-STD).This method firstly employed a residual convolutional network to extract features and construct a distance matrix to model the correlation be-tween inputs.Then,it learnt the localization information at different scales through multi-scale segmentation and decoupling heads.Finally,the model was optimized according to the multi-scale weighted localization loss,confidence loss,and classifi-cation loss.This enabled the model to achieve fine-grained prediction of keyword existence and time domain boundaries.Experimental results on the LibriSpeech dataset demonstrate that for in-vocabulary detection,the precision and intersection over union(IoU)reach 97.1%and 88.6%,respectively.In the case of out-of-vocabulary detection,the precision and IoU reach 96.7%and 88.2%,respectively.In comparison to existing methods for spoken term detection and localization,MF-STD significantly improves detection accuracy and localization precision.This fully demonstrates the superiority of the proposed method and the effectiveness of multi-scale feature modeling and fine-grained localization constraints in spoken term detection tasks.关键词
语音关键词检测/语音细粒度定位/多尺度检测/残差卷积网络Key words
spoken term detection/speech fine-grained localization/multi-scale detection/convolutional residual network分类
信息技术与安全科学引用本文复制引用
李祥瑞,毛启容..基于多尺度距离矩阵的语音关键词检测与细粒度定位方法[J].计算机应用研究,2024,41(11):3370-3375,6.基金项目
江苏省重点研发计划资助项目(BE2020036) (BE2020036)
江苏大学应急管理学院专项科研项目(KY-A-01) (KY-A-01)