数据采集与处理2001,Vol.16Issue(1):58-62,5.
一种基于共振峰恢复和 Mellin 变换的 非特定人语音特征提取方法
A Novel Speaker Normalization Method Based on Formant Recovery and Mellin Transform
摘要
Abstract
To make the speech features insensitive to the vocal tractlengths among different speakers, the Mellin transform which is scale invariant and the formant recovery based on autocorrelation are used. Firstly, instead of direct FFT, a method based on the autocorrelation estimation is used to get the formant envelope.It suppresses the effects of pitch excitation, and is functionally equivalent to timesmoothed Wigner distribution. Then, the scale invariance and the discrete implement of the modified Mellin transform is introduced. Finally, a generalized F-ratio criterion is given as an objective rule to measure the separability between different vowel classes. In the experiment, 20 Chinese vowels spoken by different persons are used, and FFT cepstrum, FFT-Mel cepstrum, FFT-Mellin cepstrum and FormantMellin cepstrum are extracted for them respectively. F-ratio seperability is calculated for them with different vector dimensions. In addition, the same experiment is carried out for utterances with 6 dB artificial Gaussian white noise. Results show that formant-Mellin cepstrum not only has the highest separability rate, but also is more robust to noise than FFT-MelIin cepstrum.关键词
Mellin变换/自相关估计/F-ratio分辨率/共振峰恢复分类
信息技术与安全科学引用本文复制引用
蒋冬梅,赵荣椿..一种基于共振峰恢复和 Mellin 变换的 非特定人语音特征提取方法[J].数据采集与处理,2001,16(1):58-62,5.基金项目
航空基础科研基金资助项目. ()