一种基于多模态变分自编码器的联合认知表征学习方法OACSTPCD
A joint cognitive representation learning method based on multi-modal variational autoencoders
目的 学习大脑视觉认知活动的多模态联合认知表征,提高认知表征的视觉信息分类性能,并从视觉图像特征中预测大脑脑电图(EEG)响应以及从EEG信号中解码出视觉图像.方法 基于多模态变分自编码器网络结构,结合混合专家乘积(MoPoE)方法学习联合认知表征,并添加基于自适应鉴别器增强的样式生成对抗网络(StyleGAN2-ADA)实现对EEG信号的编码和解码任务,包括分类任务以及图像和EEG的跨模态生成.结果 实现了不同模态的特征融合,提高了认知表征视觉信息的分类性能,并将不同模态的特征空间对齐到一个联合隐空间,为跨模态生成任务奠定了基础.基于联合隐空间进行的EEG和图像的相互跨模态生成结果优于以往工作中从一个模态到另一个模态的单向映射方法.结论 能够融合并对齐多个模态的信息,使联合认知表征的分类性能优于使用任何单一模态,并且在跨模态生成任务中取得优于模态单向映射的结果,为大脑视觉认知信息有效统一的编码和解码建模研究提供了新思路.
Objective To develop multimodal joint cognitive representations for the research of visual cognitive activities of the brain,enhance the classification performance of visual information cognitive representations,predict brain electro-encephalogram(EEG)responses from visual image features,and decode visual images from EEG signals.Methods A architecture combining a multimodal variational autoencoder network with the Mixture of Product Experts(MoPoE)approach and with a style generation adversarial network based on adaptive discriminator augmentation(Style-GAN2-ADA)was used for facilitating the learning of cognitive representations and the encoding and decoding of EEG signals.This framework not only catered to classification tasks but also enabled cross-modal generation of images and EEG data.Results The present study integrated features from different modalities,enhancing the classification accuracy of cognitive representations of visual information.By aligning the feature spaces of diverse modalities into a cohesive latent space,cross-modal generation tasks were made possible.The cross-modal generation results of EEG and images,derived from this unified latent space,outperformed the one-way mapping methods that involved transition from one modality to another employed in previous research.Conclusion This study effectively integrates and aligns information from various modalities,enabling the classification performance of joint cognitive representations beyond any single modality.Moreover,the study demonstrates superior outcomes in cross-modal generation tasks compared to modality-specific unidirectional mappings,which is expected to offer a new line of thought for the effective unified encoding and decoding modeling of visual cognitive information in the brain.
宋秋月;陈圆;贾淑钰;应晓敏;何振
南京农业大学人工智能学院,南京 210031||军事科学院军事医学研究院,北京 100850军事科学院军事医学研究院,北京 100850
基础医学
多模态变分自编码器认知表征脑电图跨模态生成
multimodal variational autoencoderscognitive representationelectroencephalogramcross-modal generation
《军事医学》 2024 (007)
516-523 / 8
国家重点研发计划(2022YFF1202400)
评论