数据采集与处理2026,Vol.41Issue(2):332-346,15.DOI:10.16337/j.1004-9037.2026.02.004
具身智能数据采集与处理综述
A Survey of Datasets Collection and Processing for Embodied Intelligence
摘要
Abstract
In recent years,vision-language-action(VLA)models have attracted significant attention in the field of embodied intelligence.As model scale continues to grow,their ability to generalize across complex tasks has steadily improved.However,such performance improvements rely heavily on the availability of large-scale,high-quality training data.Unlike natural language processing and computer vision,which can directly leverage massive internet data,data collection in embodied intelligence typically involves physical interactions between real robots and their environments,leading to high collection costs and complex acquisition processes.Efficiently obtaining,processing,and organizing such data has therefore become a critical challenge for advancing embodied intelligence.To address this issue,this paper provides a systematic review of data collection and processing methods in embodied intelligence.First,we summarize the major data acquisition paradigms from the perspective of data sources and collection strategies,and analyze their characteristics and limitations in terms of data quality,scalability,and collection cost.Second,we present a standardized processing pipeline for embodied intelligence datasets,focusing on key technical components such as action representation alignment,multimodal temporal synchronization,language semantic normalization,and data quality control.Finally,we discuss the evolving data ecosystem in embodied intelligence,highlighting current challenges and potential future directions.The analysis presented in this paper aims to provide insights for dataset construction and large-scale robot learning research in embodied intelligence.关键词
具身智能/视觉-语言-动作模型/机器人学习/大规模数据采集/数据处理Key words
embodied intelligence/vision-language-action model/robot learning/large-scale data collection/data processing分类
信息技术与安全科学引用本文复制引用
丁贵广,朱晨,王潇婉,陈辉..具身智能数据采集与处理综述[J].数据采集与处理,2026,41(2):332-346,15.基金项目
国家自然科学基金(62525103,62271281). National Natural Science Foundation of China(Nos.62525103,62271281). (62525103,62271281)