火力与指挥控制2025,Vol.50Issue(4):135-140,149,7.DOI:10.3969/j.issn.1002-0640.2025.04.019
基于细粒度图文对齐的多模态事件抽取方法
Multimodal Event Extraction Method Based on Fine-grained Image-text Alignment
摘要
Abstract
Multimodal event extraction aims to extract structured multimodal event information from image-text data,the core challenge of this task lies in bridging the gap between different modalities and establishing cross-modal associations.A multimodal event extraction method based on fine-grained image-text alignment is proposed,which consists of two stages:single modal information extraction and multimodal information fusion.First,textual event extraction and visual entity extraction models are employed to perform single modal information extraction,obtaining fine-grained event information from each modality.Subsequently,a multimodal pre-training model is used for fine-grained image-text alignment,to obtain multimodal event information.Experiments conducted on a multimodal event extraction dataset validate its effectiveness.关键词
多模态事件抽取/图文对齐/多模态预训练模型/信息抽取/事件抽取Key words
multimodal event extraction/image-text alignment/multimodal pre-trained model/information extraction/event extraction分类
信息技术与安全科学引用本文复制引用
曹健威,孙英杰,李凌寒,曾维新,胡艳丽..基于细粒度图文对齐的多模态事件抽取方法[J].火力与指挥控制,2025,50(4):135-140,149,7.基金项目
国家自然科学基金资助项目(72471237) (72471237)
(72371245) (72371245)