科技创新与应用2025,Vol.15Issue(29):10-12,18,4.DOI:10.19981/j.CN23-1581/G3.2025.29.003
基于多特征CNN与DTW的藏语发音相似度计算方法
摘要
Abstract
Speech similarity evaluation technology has become a hot spot in speech information processing research today.There is a relatively lack of research on speech similarity evaluation of Tibetan and other minority languages in China.Based on this,this paper discusses using a one-dimensional convolutional neural network(1D-CNN)hybrid model of feature compression and dynamic time warping(DTW)alignment to design a multi-feature fusion framework,and uses contrast loss and anti-noise training to solve the problems of feature redundancy,temporal deformation robustness and low resource generalization in Tibetan pronunciation similarity calculation,and finally realize the evaluation of pronunciation similarity of commonly used Tibetan words.Experiments on a self-built dataset covering three major dialects show that the overall accuracy of the model reaches 93.2%,verifying the effectiveness of this method.关键词
语音相似度评价/一维卷积神经网络/动态时间规整/多特征融合/藏语发音Key words
speech similarity evaluation/one-dimensional convolutional neural network(1D-CNN)/dynamic time warping/multi-feature fusion/Tibetan pronunciation分类
信息技术与安全科学引用本文复制引用
邢立佳,易鹏湾,张嘉幸,杨素伟,卓嘎..基于多特征CNN与DTW的藏语发音相似度计算方法[J].科技创新与应用,2025,15(29):10-12,18,4.基金项目
西藏自治区级大学生创新创业训练计划项目研究成果(S202510694039) (S202510694039)