基于特征空间轨迹信息的语音关键词检测方法OACSCDCSTPCD
Spoken Term Detection Based on Feature Space Trajectory Information
当前语音关键词检测的主流技术为深度学习,需要大规模标注样本进行训练,难以应用于更普遍的低资源场景.本文提出一种基于音频特征空间轨迹信息的低资源语音关键词检测方法,该方法基于"词是由更小语言单元(音节、音素)的结构化组成,以及语言单元声学特征具有稳定性(统计意义)"的事实,结合物理几何空间定位的原理,构建语音关键词的特征空间表达、时序信息表达和局部区分信息知识.语音关键词检测时,依据语音段的特征空间轨迹信息分层次进行判决,实现了模式信息与统计信息的综合应用.其中语音特征空间是利用丰富的无标注语音样本构建音频特征空间的标识子表达,而语音关键词的特征空间轨迹信息利用少量关键词语音样本构建.多个实验结果表明,本文算法在低资源时(100个样本以下),相比HMM和CRNN有显著优势,10个训练样本时,相比HMM,FRR绝对下降了20.5%,FAR绝对下降了 8.7 FP/h;而在训练样本量较充分(300个样本及以上)时,与CRNN有大致相当的性能.
The current technique of spoken term detection is dominated by deep learning,which requires large anno-tated data for training,and is difficult to be applied in limited-data scenarios.In this paper,a feature trajectory based meth-od of spoken term detection is proposed for limited-data scenarios.The method originated from the fact that a word is a structured organization of small units such as syllable or phoneme and any language unit has steady statistical audio feature,based on the principle of physical location,feature distribution,temporal information of keywords,and local distinguishing information are constructed with speech examples.Spoken keywords are searched with the feature trajectory information of the detected speech segment in hierarchical decision strategy.The method works on a audio feature space defined by a iden-tifier set trained with a large unlabeled speech dataset.Several experimental results show that the proposed method is evi-dently superior to HMM and CRNN when the training samples is less than 100.For example,when 10 samples are used for training,FRR and FAR of the propose method are absolutely decreased by 20.5%and 8.7 FP/hour respectively compared with HMM-based system.On the other hand,the proposed method achieved the comparable performance v.s.CRNN-based system when the training samples is more than 300.
田颖慧;贺前华;郑若伟;危卓;李艳雄
华南理工大学,广东广州 510641华南理工大学,广东广州 510641华南理工大学,广东广州 510641华南理工大学,广东广州 510641华南理工大学,广东广州 510641
计算机与自动化
语音关键词检测音频特征空间特征空间轨迹信息低资源
spoken term detectionaudio feature spacefeature space trajectory informationlimited-data source
《电子学报》 2023 (10)
非现场说话人认证语音真实性检测关键技术研究
2915-2924,10
广东省自然科学基金(No.2022A1515011687)国家自然科学基金(No.61571192)Guangdong Natural Science Foundation(No.2022A1515011687)National Nature Science Foundation of China(No.61571192)
评论