计算机技术与发展2025,Vol.35Issue(8):45-52,8.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0070
卷积增强Vision Mamba模型的构建及其应用
Construction of Convolutional Vision Mamba Model and Its Application
摘要
Abstract
An improved model-Convolutional Vision Mamba(CvM)is proposed to address the issues of the Vision Mamba(Vim)model.In order to achieve more efficient processing of global visual information,the CvM model uses convolutional operations to replace the graphic segmentation and position encoding mechanisms in Vim.And Vim's drawbacks are optimized including high computational cost and memory consumption in the position embedding module.Moreover,the CvM model is applied to the field of medical image classification,integrating with multiple image datasets such as blood cell images,brain tumor images,chest CT scans,pathological myopia images,and pneumonia X-ray images.The experimental results show that compared with the Vim model and five other neural network models,the CvM model not only performs well in accuracy,but also exhibits significant improvements in memory usage and parameter number.Finally,in the ablation study,the depthwise separable convolution performs better than the standard convolution in reducing the number of parameters and memory usage,and significantly improves the accuracy of classification on blood cell images and brain tumor images.These results fully demonstrate the feasibility and significant superiority of the CvM model.关键词
深度学习/Vision Mamba/卷积神经网络/深度可分离卷积/医学图像分类Key words
deep learning/Vision Mamba/convolutional neural network/depth-wise separable convolution/medical image classification分类
信息技术与安全科学引用本文复制引用
俞焕友,范静,黄凡..卷积增强Vision Mamba模型的构建及其应用[J].计算机技术与发展,2025,35(8):45-52,8.基金项目
国家自然科学基金资助项目(11601316) (11601316)