桂林电子科技大学学报2024,Vol.44Issue(6):560-567,8.DOI:10.16725/j.1673-808X.202360
基于场景中物体位置关系的图像描述方法
Image description method based on object position relationship in scene
摘要
Abstract
Image description aims to transform visual content into language description,which is an urgent and challenging multi-modal generation task.Due to the lack of attention to the implicit position information in the most image description methods,it is difficult to accurately describe the position relationship of the objects in the image.For solving this problem,the position relation-ship encoder-combine decoder(PRCO)structure is proposed,which focus on and generate the objects positional relationships.A novel position relationship-encoder get started with the object relationship scene graph using node features.Technically,common sense dictionary and reasoning module are created to calculate the degree of imbalance between objects,which are used to perform a secondary encoding of the object relationship nodes.Specifically,the combine-decoder is designed to process the encoded informa-tion,with an erasing module and bias gate to optimize the node features in the graph.Experiments are conducted on MSCOCO and Visual Genome Image description dataset,and superior results in comparing to state-of-the-art approaches.More remarkably,PRCO achieves an increases CIDEr performance on Visual Genome testing set.Our code is publicly available on Gitee:https://gitee.com/ymw12345/PRCO.关键词
图像描述/图卷积网络/长短期记忆网络/位置关系编码器/联合解码器Key words
image description/graph convolutional networks/long short-term memory/position relationship encoder/combine de-coder分类
信息技术与安全科学引用本文复制引用
杨璐,钱艺,文益民..基于场景中物体位置关系的图像描述方法[J].桂林电子科技大学学报,2024,44(6):560-567,8.基金项目
广西重点研发计划(桂科AB21220023) (桂科AB21220023)
国家自然科学基金(61866007) (61866007)
广西图像图形与智能处理重点实验室基金(GIIP2005) (GIIP2005)