生命科学仪器2023,Vol.21Issue(4):35-39,5.DOI:10.11967/2023210805
基于自适应序列帧长度的端到端式唇语识别算法
An End-to-End Lip-Reading Recognition Algorithm Based on the Adaptive Length of Frame Sequence
摘要
Abstract
The proposed lip-reading recognition provides a brand new idea for many emerging fields such as infor-mation security and assisted driving.Existing lip-reading recognition algorithms feature complex schemes,slow convergence rates,and difficulty to load into portable devices for end-to-end application.Based on convolutional neural network(CNN)and bidirectional long short-term memory(BLSTM)recurrent neural network,this work reports on an end-to-end neural network model for lip-reading recognition with adaptive sequence length.The algorithm assigns the location of feature regions in the video stream by Dlib feature point localization method and transfers them into the CNN neural network for pre-processing to obtain the switch signal,and then transfers the obtained switch signal into the BLSTM to control the length of frame sequence.The algorithm models the underly-ing temporal information between frames more fully and can be loaded into the end-to-end portable devices.The algorithm has been experimentally validated to achieve a recognition accuracy of 67.2%on the dataset MIR ACL-VC1,offering an absolute improvement of 11.2%to the previous state-of-the-art adaptive lip-reading recog-nition algorithm.关键词
唇语识别/端到端/卷积神经网络/双向长短期记忆Key words
Lip-reading recognition/end-to-end/convolutional neural network/bidirectional long short-term memory分类
医药卫生引用本文复制引用
吴威龙,李润恺,许霜烨,朱真..基于自适应序列帧长度的端到端式唇语识别算法[J].生命科学仪器,2023,21(4):35-39,5.基金项目
国家重点研发计划(2021YFF0701002) (2021YFF0701002)