| 注册
首页|期刊导航|计算机应用与软件|基于多级空洞金字塔网络的视频指令学习框架

基于多级空洞金字塔网络的视频指令学习框架

朱展模 陈俊洪 杨振国 刘文印

计算机应用与软件2024,Vol.41Issue(5):118-125,146,9.
计算机应用与软件2024,Vol.41Issue(5):118-125,146,9.DOI:10.3969/j.issn.1000-386x.2024.05.019

基于多级空洞金字塔网络的视频指令学习框架

A VIDEO COMMANDS LEARNING FRAMEWORK BASED ON MULTI-STAGE ATROUS PYRAMID NETWORK

朱展模 1陈俊洪 1杨振国 1刘文印1

作者信息

  • 1. 广东工业大学计算机学院 广东广州 510006
  • 折叠

摘要

Abstract

We propose a video commands learning framework based on multi-stage atrous pyramid network(MS-APN)for generating robot manipulation instructions from untrimmed videos.Specifically,we introduced an atrous convolution pyramid module to capture multi-scale action features and a multi-stage architecture to refine the segmentation results.The untrimmed video was divided into a series of video segments,and action features were extracted.We applied the object detection model to extract the object features,and they were fused with the action features for inputting into two classifiers to recognize the subject and patient object.A command quadruplet was defined to represent robot commands.Experiments conducted on the MPII Cooking 2 dataset show that the accuracy of the action segmentation,object classification,and robot commands generation reach 84.1%,76.5%,62.4%,respectively.And we successfully deploy our system on a Baxter robot for further verifying the effectiveness of our framework.

关键词

视频指令学习/机器人指令生成/动作分割/空洞卷积

Key words

Video commands learning/Robot commands generation/Action segmentation/Atrous convolution

分类

信息技术与安全科学

引用本文复制引用

朱展模,陈俊洪,杨振国,刘文印..基于多级空洞金字塔网络的视频指令学习框架[J].计算机应用与软件,2024,41(5):118-125,146,9.

基金项目

国家自然科学基金项目(91748107) (91748107)

广东省基础与应用基础研究基金项目(2020A1515010616) (2020A1515010616)

广东省引进创新科研团队计划项目(2014ZT05G157) (2014ZT05G157)

广东省科技创新战略专项资金项目(pdjh2020a0173). (pdjh2020a0173)

计算机应用与软件

OA北大核心CSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文