摘要
Abstract
Facial expressions are the most natural,powerful,and most direct ways of conveying human emotional states and intentions.The recognition of facial expressions by machines is widely used in human-computer interaction and data-driven anima-tion.To deal with the challenges of complex changes such as occlusion,illumination,and pose in the real world,a landmark-guid-ed facial expression recognition network(LGFER-T)is proposed.The whole network consists of two parts,which are LGFER and Transformer.Based on the guidance of facial landmarks,LGFER uses deformable convolution to extract spatial features of static im-ages,and then uses Transformer to further associate temporal features,and finally the recognition and classification of facial expres-sions are carried out.In this paper,the effectiveness of the method is verified on the facial expression static image dataset SFEW and the video dataset AFEW respectively.Extensive experiments show that the accuracy rate of facial expression recognition on the SFEW dataset is 59.17%using the spatial feature extraction network LGFER alone.And,LGFER-T achieves 51.96%accuracy on video dataset AFEW when combining with Transformer.The method proposed in this paper is at the leading level on both still image and video datasets.关键词
面部表情识别/关键点指引/TransformerKey words
facial expression recognition/landmark-guided/Transformer分类
数理科学