湖北民族大学学报(自然科学版)2026,Vol.44Issue(1):75-81,7.DOI:10.13501/j.cnki.42-1908/n.2026.03.006
基于像素级特征调制与文本引导增强的组合零样本学习模型
Compositional Zero-shot Learning Model Based on Pixel-level Feature Modulation and Text-guided Refinement
摘要
Abstract
To address the insufficient generalization to unseen attribute-object compositions in compositional zero-shot learning(CZSL),a pixel-level feature modulation and text-guided refinement for compositional zero-shot learning(PFMTR)model was proposed to boost the recognition of novel compositions.Firstly,a pixel-level feature modulation(PLFM)module was devised,which employed a dual attention mechanism operating at both pixel-level and patch-level to enable fine-grained reassembly and semantic enhancement of image features.Secondly,a text-guided refinement(TGR)module was proposed,where textual features were used as queries and visual features as keys/values.This module leveraged cross-modal attention to compute semantic relevance weights,thereby dynamically guiding visual features with linguistic semantics and achieving cross-modal alignment.The results showed that,compared with other state-of-the-art models,the PFMTR model achieved outstanding performance on the University of Texas Zappos(UT-Zappos)dataset under the open-world setting,attaining 35.7%in the area under curve(AUC)and 49.7%in the harmonic mean(HM).This study demonstrated that the recognition of unseen compositions could be effectively enhanced by integrating pixel-wise local modulation with cross-modal semantic guidance,offering a viable technical route for CZSL in complex scenarios.关键词
组合零样本学习/像素级/跨模态对齐/注意力机制/视觉-语言模型Key words
compositional zero-shot learning/pixel-level/cross-modal alignment/attention mechanism/vision-language model分类
信息技术与安全科学引用本文复制引用
赵薇,包象琳,杜文龙,徐晓峰..基于像素级特征调制与文本引导增强的组合零样本学习模型[J].湖北民族大学学报(自然科学版),2026,44(1):75-81,7.基金项目
国家自然科学基金项目(62406004) (62406004)
安徽高校自然科学研究项目(2024AH050122) (2024AH050122)
安徽未来技术研究院项目(2023qyhz14). (2023qyhz14)