智能科学与技术学报2025,Vol.7Issue(2):211-220,10.DOI:10.11959/j.issn.2096-6652.202515
基于自适应池化注意力Transformer的唇语识别方法
A lip reading method based on adaptive pooling attention Transformer
姚云 1胡振虓 1邓涛 1王晓1
作者信息
- 1. 安徽大学人工智能学院,安徽 合肥 230031
- 折叠
摘要
Abstract
Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images,thereby enabling semantic information recognition.Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames.However,they suffer from significant information loss,especially when the video information is incomplete or contains noise.In such cases,the model often struggles to distinguish between lip movements at different time points,leading to a significant decline in recognition performance.To address this issue,a lip reading method based on adaptive pooling attention transformer(APAT-LR)was proposed.This method introduced an adaptive pooling module before the multi-head self-attention(MHSA)mechanism in the standard Transformer,using a concatenation strategy of max pooling and average pooling.This module effectively suppressed irrelevant information and enhances the representation of key features.Experiments on the CMLR and GRID datasets showed that the proposed APAT-LR method could reduce the recognition error rate,thus verifying the effectiveness of the proposed method.关键词
注意力机制/Transformer/卷积池化/自适应Key words
attention mechanism/Transformer/convolutional pooling/adaptive分类
信息技术与安全科学引用本文复制引用
姚云,胡振虓,邓涛,王晓..基于自适应池化注意力Transformer的唇语识别方法[J].智能科学与技术学报,2025,7(2):211-220,10.