| 注册
首页|期刊导航|中北大学学报(自然科学版)|基于STM32微控制器的MCUVLM-RWKV视觉-语言模型研究

基于STM32微控制器的MCUVLM-RWKV视觉-语言模型研究

朱忠诺 邵星灵 李秀源 邓瑞祥 徐悦梅 张强

中北大学学报(自然科学版)2026,Vol.47Issue(1):71-79,9.
中北大学学报(自然科学版)2026,Vol.47Issue(1):71-79,9.DOI:10.62756/jnuc.issn.1673-3193.2025.09.0014

基于STM32微控制器的MCUVLM-RWKV视觉-语言模型研究

Research on MCUVLM-RWKV Vision-Language Model Based on STM32 Microcontroller

朱忠诺 1邵星灵 2李秀源 1邓瑞祥 2徐悦梅 2张强2

作者信息

  • 1. 中北大学 仪器与电子学院,山西 太原 030051
  • 2. 中北大学 电气与控制工程学院,山西 太原 030051
  • 折叠

摘要

Abstract

With the widespread application of artificial intelligence in fields such as security,industry,and agriculture,the demand for edge devices on vision reasoning tasks continues to grow.However,due to hardware constraints,deployment schemes of vision-language models designed for STM32 microcontrollers remain relatively scarce.To address this problem,this paper proposes an STM32-oriented vision-language model,MCUVLM-RWKV.The model integrates three core modules:a lightweight vision encoder,a lightweight vision feature mapper,and an RWKV decoder with a dual-mode operation mechanism,enabling image captioning tasks.Experimental results show that under the memory and storage limitations of STM32,MCUVLM-RWKV outperforms several mainstream models in evaluation metrics such as BLEU-4,ROUGE-L,and METEOR.Specifically,the ROUGE-L score reaches 55.7,which is significantly higher than that of other comparative models,indicating stronger modeling capability in long-sequence reasoning tasks.In addition,MCUVLM-RWKV demonstrates excellent performance in terms of parameter scale and inference memory consumption,further verifying its reasoning efficiency and deployment feasibility in MCU scenarios.

关键词

STM32/视觉-语言模型/边缘计算/内存优化/RWKV/图像描述

Key words

STM32/vision-language model/edge computing/memory optimization/RWKV/image captioning

分类

信息技术与安全科学

引用本文复制引用

朱忠诺,邵星灵,李秀源,邓瑞祥,徐悦梅,张强..基于STM32微控制器的MCUVLM-RWKV视觉-语言模型研究[J].中北大学学报(自然科学版),2026,47(1):71-79,9.

基金项目

国家自然科学基金资助项目(62203404) (62203404)

中北大学学报(自然科学版)

1673-3193

访问量0
|
下载量0
段落导航相关论文