| 注册
首页|期刊导航|Visual Intelligence|Mini-InternVL:a flexible-transfer pocket multi-modal model with 5%parameters and 90%performance

Mini-InternVL:a flexible-transfer pocket multi-modal model with 5%parameters and 90%performance

Zhangwei Gao Zhe Chen Erfei Cui Yiming Ren Weiyun Wang Jinguo Zhu Hao Tian Shenglong Ye Junjun He Xizhou Zhu Lewei Lu Tong Lu Yu Qiao Jifeng Dai Wenhai Wang

Visual Intelligence2024,Vol.2Issue(1):P.392-408,17.
Visual Intelligence2024,Vol.2Issue(1):P.392-408,17.DOI:10.1007/s44267-024-00067-6

Mini-InternVL:a flexible-transfer pocket multi-modal model with 5%parameters and 90%performance

Zhangwei Gao 1Zhe Chen 2Erfei Cui 1Yiming Ren 3Weiyun Wang 4Jinguo Zhu 5Hao Tian 6Shenglong Ye 5Junjun He 5Xizhou Zhu 7Lewei Lu 6Tong Lu 8Yu Qiao 5Jifeng Dai 9Wenhai Wang10

作者信息

  • 1. Shanghai AI Laboratory,Shanghai,200232,China School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai,200240,China
  • 2. Shanghai AI Laboratory,Shanghai,200232,China School of Computer Science,Nanjing University,Nanjing,210023,China
  • 3. Shanghai AI Laboratory,Shanghai,200232,China Tsinghua Shenzhen International Graduate School,Tsinghua University,Shenzhen,518055,China
  • 4. Shanghai AI Laboratory,Shanghai,200232,China School of Computer Science,Fudan University,Shanghai,200433,China
  • 5. Shanghai AI Laboratory,Shanghai,200232,China
  • 6. SenseTime Research,Shanghai,200233,China
  • 7. Shanghai AI Laboratory,Shanghai,200232,China Department of Electronic Engineering,Tsinghua University,Beijing,100084,China
  • 8. School of Computer Science,Nanjing University,Nanjing,210023,China
  • 9. Shanghai AI Laboratory,Shanghai,200232,China Department of Electronic Engineering,Tsinghua University,Beijing,100084,China Beijing National Research Center for Information Science and Technology,Department of Automation,Tsinghua University,Beijing,100084,China
  • 10. Shanghai AI Laboratory,Shanghai,200232,China Department of Information Engineering,The Chinese University of Hong Kong,Hong Kong,999077,China
  • 折叠

摘要

关键词

Lightweight multi-modal large language model/Vision-language model/Knowledge distillation/Visual instruction tuning

分类

信息技术与安全科学

引用本文复制引用

Zhangwei Gao,Zhe Chen,Erfei Cui,Yiming Ren,Weiyun Wang,Jinguo Zhu,Hao Tian,Shenglong Ye,Junjun He,Xizhou Zhu,Lewei Lu,Tong Lu,Yu Qiao,Jifeng Dai,Wenhai Wang..Mini-InternVL:a flexible-transfer pocket multi-modal model with 5%parameters and 90%performance[J].Visual Intelligence,2024,2(1):P.392-408,17.

基金项目

supported by the National Key R&D Program of China(Nos.2022ZD0160102 and 2022ZD0161300) (Nos.2022ZD0160102 and 2022ZD0161300)

the National Natural Science Foundation of China(Nos.62376134 and 62372223). (Nos.62376134 and 62372223)

Visual Intelligence

2097-3330

访问量1
|
下载量0
段落导航相关论文