首页|期刊导航|集成技术|文本引导视频预测大模型的场景动态控制综述

文本引导视频预测大模型的场景动态控制综述

吴福祥程俊

集成技术2025，Vol.14Issue(1)：9-24,16.

集成技术2025，Vol.14Issue(1)：9-24,16.DOI:10.12146/j.issn.2095-3135.20241201002

文本引导视频预测大模型的场景动态控制综述

A Review of Scene Dynamic Control in Text-Guided Video Prediction Large Models

吴福祥 ¹程俊¹

作者信息

1. 中国科学院深圳先进技术研究院深圳 518055
折叠

摘要

Abstract

In recent years,the rapid development of generative AI has made text-driven video prediction large models a hot topic in academia and industry.Video prediction and generation should address temporal dynamics and consistency,requiring precise control of scene structures,subject behaviors,camera movements,and semantic expressions.One major challenge is accurately controlling scene dynamics in video prediction to achieve high-quality,semantically consistent outputs.Researchers have proposed key control methods,including camera control enhancement,reference video control,semantic consistency enhancement,and subject feature control improvement.These methods aim to improve generation quality,ensuring outputs align with historical context while meeting user needs.This paper systematically explores the core concepts,advantages,limitations,and future directions of these four control approaches.

关键词

文本驱动视频预测/动态控制/相机控制/语义增强/主体特征控制

Key words

text-driven video prediction/dynamic control/camera control/semantic enhancement/subject feature control

分类

信息技术与安全科学

引用本文复制引用

吴福祥,程俊..文本引导视频预测大模型的场景动态控制综述[J].集成技术,2025,14(1):9-24,16.

基金项目

国家自然科学基金项目(U21A20487,62372440) This work is supported by National Natural Science Foundation of China(U21A20487,62372440) （U21A20487,62372440）

集成技术

ISSN：2095-3135

访问量0

下载量0

段落导航