中北大学学报(自然科学版)2026,Vol.47Issue(1):71-79,9.DOI:10.62756/jnuc.issn.1673-3193.2025.09.0014
基于STM32微控制器的MCUVLM-RWKV视觉-语言模型研究
Research on MCUVLM-RWKV Vision-Language Model Based on STM32 Microcontroller
摘要
Abstract
With the widespread application of artificial intelligence in fields such as security,industry,and agriculture,the demand for edge devices on vision reasoning tasks continues to grow.However,due to hardware constraints,deployment schemes of vision-language models designed for STM32 microcontrollers remain relatively scarce.To address this problem,this paper proposes an STM32-oriented vision-language model,MCUVLM-RWKV.The model integrates three core modules:a lightweight vision encoder,a lightweight vision feature mapper,and an RWKV decoder with a dual-mode operation mechanism,enabling image captioning tasks.Experimental results show that under the memory and storage limitations of STM32,MCUVLM-RWKV outperforms several mainstream models in evaluation metrics such as BLEU-4,ROUGE-L,and METEOR.Specifically,the ROUGE-L score reaches 55.7,which is significantly higher than that of other comparative models,indicating stronger modeling capability in long-sequence reasoning tasks.In addition,MCUVLM-RWKV demonstrates excellent performance in terms of parameter scale and inference memory consumption,further verifying its reasoning efficiency and deployment feasibility in MCU scenarios.关键词
STM32/视觉-语言模型/边缘计算/内存优化/RWKV/图像描述Key words
STM32/vision-language model/edge computing/memory optimization/RWKV/image captioning分类
信息技术与安全科学引用本文复制引用
朱忠诺,邵星灵,李秀源,邓瑞祥,徐悦梅,张强..基于STM32微控制器的MCUVLM-RWKV视觉-语言模型研究[J].中北大学学报(自然科学版),2026,47(1):71-79,9.基金项目
国家自然科学基金资助项目(62203404) (62203404)