首页|期刊导航|智能科学与技术学报|基于自适应池化注意力Transformer的唇语识别方法

基于自适应池化注意力Transformer的唇语识别方法

姚云胡振虓邓涛王晓

智能科学与技术学报2025，Vol.7Issue(2)：211-220,10.

智能科学与技术学报2025，Vol.7Issue(2)：211-220,10.DOI:10.11959/j.issn.2096-6652.202515

基于自适应池化注意力Transformer的唇语识别方法

A lip reading method based on adaptive pooling attention Transformer

姚云 ¹胡振虓 ¹邓涛 ¹王晓¹

作者信息

1. 安徽大学人工智能学院,安徽合肥 230031
折叠

摘要

Abstract

Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images,thereby enabling semantic information recognition.Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames.However,they suffer from significant information loss,especially when the video information is incomplete or contains noise.In such cases,the model often struggles to distinguish between lip movements at different time points,leading to a significant decline in recognition performance.To address this issue,a lip reading method based on adaptive pooling attention transformer(APAT-LR)was proposed.This method introduced an adaptive pooling module before the multi-head self-attention(MHSA)mechanism in the standard Transformer,using a concatenation strategy of max pooling and average pooling.This module effectively suppressed irrelevant information and enhances the representation of key features.Experiments on the CMLR and GRID datasets showed that the proposed APAT-LR method could reduce the recognition error rate,thus verifying the effectiveness of the proposed method.

关键词

注意力机制/Transformer/卷积池化/自适应

Key words

attention mechanism/Transformer/convolutional pooling/adaptive

分类

信息技术与安全科学

引用本文复制引用

姚云,胡振虓,邓涛,王晓..基于自适应池化注意力Transformer的唇语识别方法[J].智能科学与技术学报,2025,7(2):211-220,10.

智能科学与技术学报

ISSN：2096-6652

访问量0

下载量0

段落导航