| 注册
首页|期刊导航|计算机应用研究|面向函数图像数据的多模态大模型训练策略

面向函数图像数据的多模态大模型训练策略

明一博 陈彦敏 赵嘉璐

计算机应用研究2025,Vol.42Issue(11):3421-3429,9.
计算机应用研究2025,Vol.42Issue(11):3421-3429,9.DOI:10.19734/j.issn.1001-3695.2025.02.0063

面向函数图像数据的多模态大模型训练策略

Multimodal large model training strategy for functional image data

明一博 1陈彦敏 1赵嘉璐1

作者信息

  • 1. 新疆师范大学 计算机与科学技术学院,乌鲁木齐 830054
  • 折叠

摘要

Abstract

In recent years,multimodal large language models have undergone rapid development and demonstrate excellent performance in various multimodal downstream tasks.However,the current mainstream multimodal large language models still perform unsatisfactorily in function image reasoning tasks,which requires the model not only possesses strong visual perception capabilities but also performs chained thinking reasoning to accurately understand and answer questions involving mathematical functions.To address these issues,it firstly constructed a specially designed instruction fine-tuning dataset,FunctionQA,tai-lored for function image reasoning tasks.Each data point,in addition to standard question-answer pairs,also included a de-tailed chained reasoning process,ensuring that the model could learn complex reasoning steps during training.Secondly,it designed a four-stage fine-tuning strategy for function image reasoning tasks,gradually optimizing the visual encoder,multimo-dal adapter,and large language model,and incorporating LoRA technology to reduce training costs.Experimental results show that the mFunction-4B model,built on the LLaVA framework,achieves an accuracy of 43.55%on the MathVista testmini FunctionQA subset with 4B parameters after optimization using the FunctionQA dataset and the four-stage fine-tuning strategy,representing a 14.52%improvement over the baseline model LLaVA-1.5-7B,validating the feasibility and effectiveness of the proposed method.

关键词

多模态大语言模型/链式思维推理/指令微调/LoRA

Key words

multimodal large language model(MLLM)/chain thinking reasoning/instruction fine-tuning/low-rank adapta-tion(LoRA)

分类

信息技术与安全科学

引用本文复制引用

明一博,陈彦敏,赵嘉璐..面向函数图像数据的多模态大模型训练策略[J].计算机应用研究,2025,42(11):3421-3429,9.

基金项目

新疆维吾尔自治区自然科学基金资助项目(2022D01A227) (2022D01A227)

新疆维吾尔自治区重点研发专项(2022B01007-1) (2022B01007-1)

计算机应用研究

OA北大核心

1001-3695

访问量0
|
下载量0
段落导航相关论文