广西科学院学报2025,Vol.41Issue(1):1-11,11.DOI:10.13657/j.cnki.gxkxyxb.20250429.001
CAMN:基于跨模态属性匹配对齐的文本检索行人模型
CAMN:A Pedestrian Model for Text Retrieval Based on Cross-modal Attribute Matching Alignment
摘要
Abstract
Existing pedestrian retrieval models have made good progress in the global feature alignment be-tween image and text.However,it still has shortcomings in capturing pedestrian details and deeply mining the internal dependencies between image and text.In view of the above problems,firstly,this paper designs a new image feature extraction network(Multi-head self-attention network,MHANet)to obtain more detailed global image features.Secondly,in order to improve the problem of insufficient correlation between image and text local attribute features,this paper proposes a cross-modal attribute attention(ACA)module,which aims to strengthen the local attribute feature expression of the image under the guidance of text information.Final-ly,combined with the MHANet and ACA modules,this paper proposes a text retrieval pedestrian model based on Cross-modal Attribute Matching Alignment Network(CAMN),which optimizes the text-to-image retrieval effect by accurately aligning global and local attribute features.The experimental results show that compared with the visual-textual attributes alignment in person search by natural language(ViTAA)net-work,the Rank-5 of CAMN on the three public datasets of CUHK-PEDES,ICFG-PEDES,and RSTPReid is increased by 8.33 percentage point,9.30 percentage point,and 9.73 percentage point,respectively.Compared with other algorithms,it has obvious advantages,which indicates that CAMN has the ability to align image and text attributes,and overcomes the limitations of traditional image feature extraction methods.关键词
行人检索/跨模态/注意力机制/属性对齐Key words
pedestrian retrieval/cross-modality/attention mechanism/attribute alignment分类
计算机与自动化引用本文复制引用
覃晓,卢虹妃,吴琨生..CAMN:基于跨模态属性匹配对齐的文本检索行人模型[J].广西科学院学报,2025,41(1):1-11,11.基金项目
科技部科技创新2030-"脑科学与类脑研究"重大项目(2021ZD0201904)和广西科技重大专项(桂科AA22068057)资助. (2021ZD0201904)