首页|期刊导航|智能科学与技术学报|平行智能范式视角下的视觉-语言-动作模型发展现状与展望

平行智能范式视角下的视觉-语言-动作模型发展现状与展望

李柏郝金第孙跃硕孟雨晴黄峻田永林贺正冰

智能科学与技术学报2025，Vol.7Issue(3)：290-303,14.

智能科学与技术学报2025，Vol.7Issue(3)：290-303,14.DOI:10.11959/j.issn.2096-6652.202536

平行智能范式视角下的视觉-语言-动作模型发展现状与展望

Vision-language-action models under parallel intelligence paradigm:the state of the art and future perspectives

李柏 ¹郝金第 ²孙跃硕 ²孟雨晴 ¹黄峻 ³田永林 ⁴贺正冰⁵

作者信息

1. 华东师范大学通信与电子工程学院,上海 200241
2. 湖南大学机械与运载工程学院,湖南长沙 410082
3. 澳门科技大学创新工程学院,澳门 999078
4. 中国科学院自动化研究所多模态人工智能系统全国重点实验室,北京 100190
5. 麻省理工学院信息与决策系统实验室,剑桥 02139-4307
折叠

摘要

Abstract

Vision-language-action(VLA)models are a comprehensive modeling approach for embodied intelligence that integrates visual perception,natural language understanding,and action execution within a unified framework,aiming to establish a continuous loop from environmental perception to task planning and action control.Their operational logic cor-responds closely to the paradigm of parallel intelligence articulated in the early 21st century.That paradigm comprises ar-tificial systems,computational experiments,and parallel execution,emphasizing virtual modeling,reproducible inference,and closed-loop interaction between the virtual and the real.The initial stage of VLA development,driven by multimodal deep learning,can be regarded as prototypical work within artificial systems,the subsequent stage characterized by large-scale models and cross-domain training expanded the scope of computational experiments,and the more recent focus on hierarchical control and virtual-real closed loops reflects the feedback correction and normative guidance emphasized in parallel execution.VLA models exhibit deep coupling between semantics and action,iterative cycles linking simulation with reality,and steadily improving verifiability.Nonetheless,challenges remain in generalization,semantic alignment,safety and interpretability,and deployment efficiency.Addressing these issues calls for contract-based task semantics,re-pairable long-horizon hierarchical planning,engineering-oriented use of world models,multi-level feedback and safety governance,and cross-platform transfer with human-machine collaboration.Examining VLA through the lens of parallel intelligence clarifies its developmental logic and provides methodological support for advancing toward trustworthy real-world applications.

关键词

平行智能/视觉-语言-动作模型/具身智能/多模态融合/虚实交互

Key words

parallel intelligence/vision-language-action model/embodied intelligence/multimodal fusion/virtual-real in-teraction

分类

信息技术与安全科学

引用本文复制引用

李柏,郝金第,孙跃硕,孟雨晴,黄峻,田永林,贺正冰..平行智能范式视角下的视觉-语言-动作模型发展现状与展望[J].智能科学与技术学报,2025,7(3):290-303,14.

基金项目

澳门特别行政区科学技术发展基金项目(No.0157/2024/RIA2,No.0093/2023/RIA2,No.0145/2023/RIA3) （No.0157/2024/RIA2,No.0093/2023/RIA2,No.0145/2023/RIA3）

国家自然科学基金项目(No.62103139) （No.62103139）

湖南省芙蓉计划湖湘青年英才项目(No.2023RC3115)Science and Technology Development Fund,Macao Special Administrative Region(No.0157/2024/RIA2,No.0093/2023/RIA2,No.0145/2023/RIA3),National Natural Science Foundation of China(No.62103139),Hibiscus Mutabilis Youth Talent Program of Hunan Province(No.2023RC3115) （No.2023RC3115）

智能科学与技术学报

ISSN：2096-6652

访问量0

下载量0

段落导航