高技术通讯2017,Vol.27Issue(7):596-603,8.DOI:10.3772/j.issn.1002-0470.2017.07.002
利用拼音特征的深度学习文本分类模型
A deep learning model for text classification using phonetic features
摘要
Abstract
A deep learning model using the features of the consonant and vowel in Chinese Pinyin was proposed for the intention texts speech recognized in human-robot voice interaction.Firstly, by taking unmanned vehicle voice navi-gation as the application scenarios of human-robot interaction, the intention text structure was analyzed, and a sin-gle intention corpus and a complex intention corpus were built respectively;Secondly, based on the character-level features in text classification, a feature representation method using consonant and vowel in Pinyin for Chinese text classification was proposed with considering the differences between Chinese Pinyin and English words.Thirdly, traditional recurrent neural network ( RNN) units were replaced by gated recurrent units ( GRU) for the problem of difficulties in capturing long-term dependencies.To extract high-level features, shorten the length of feature se-quences and increase the convergence rate of the model, a deep learning model combining the convolutional neural network ( CNN) with the GRU-RNN was established.Finally, to evaluate the performance of the model on short and long sequence tasks, 10-fold cross validations were implemented on corpuses for two tasks respectively, and then the comparisons and analysis were carried out against other classification methods.The result shows that the proposed model can significantly improve the accuracy of classification for the intention texts.关键词
文本分类/意图理解/声韵母特征/门控递归单元(GRU)Key words
text classification/intention understanding/features of consonant and vowel/gated recurrent units ( GRU)引用本文复制引用
赵博轩,房宁,赵群飞,张朋柱..利用拼音特征的深度学习文本分类模型[J].高技术通讯,2017,27(7):596-603,8.基金项目
国家自然科学基金(91646205)资助项目. (91646205)