摘要
Abstract
Pedestrian trajectory prediction is one of the core challenges in fields such as autonomous driving and ro⁃botic navigation.Its key difficulty lies in effectively modeling complex interactions among pedestrians and extracting multi-scale spatiotemporal features.This paper proposes a pedestrian trajectory prediction method based on graph convolution and adaptive transformer(GCAT),which achieves high-precision trajectory prediction through hierarchical feature extraction and adaptive interaction modeling.The model takes the position and velocity information of all pedestrians within a histori⁃cal observation window as input.First,linear projection and sinusoidal positional encoding are applied to map the raw obser⁃vations into a high-dimensional feature space,explicitly preserving temporal order information.Subsequently,a relational graph convolutional network is introduced to capture local topological structures and spatial interaction strengths among pe⁃destrians.An adaptive adjacency matrix based on feature cosine similarity is constructed in real time to model pedestrian in⁃teractions,enabling the graph structure to dynamically adjust according to scene characteristics.In addition,an enhanced multi-layer convolutional structure is employed,where learnable residual weights are used to adaptively balance the contri⁃butions of features at different layers.This design effectively alleviates the gradient vanishing problem in deep networks and strengthens the representation capability of local interaction features.Furthermore,the model incorporates a spatially adaptive Transformer to model global spatiotemporal dependencies.This module achieves continuous sampling over feature maps through learnable spatial offsets.Specifically,spatial offsets and attention weights are generated from the input fea⁃tures via linear layers.The offsets are added to reference point coordinates and normalized to obtain actual sampling loca⁃tions.Bilinear interpolation is then used to extract feature values at these locations from the feature maps,which are subse⁃quently aggregated using the attention weights.This process yields enhanced representations that capture both local geomet⁃ric variations and global temporal dependencies.The continuous sampling strategy enables the model to focus on spatial re⁃gions most relevant to trajectory prediction and to adaptively handle geometric layout variations across different scenes.Meanwhile,the model further integrates multi-granularity temporal features,progressively extracting multi-level spatiotem⁃poral representations ranging from local interactions to global dependencies.This design effectively addresses key limita⁃tions of existing methods in modeling long-range dependencies,environmental adaptability,and multi-scale feature fusion.For experimental validation,the proposed method is systematically evaluated on two widely used public pedestrian trajecto⁃ry prediction datasets,ETH and UCY.Compared with existing baseline models,the proposed approach achieves improve⁃ments of 5.1%and 13.2%in terms of average displacement error(ADE)and final displacement error(FDE),respectively,demonstrating its effectiveness and superiority in complex interaction modeling and multi-scale spatiotemporal feature ex⁃traction.关键词
轨迹预测/局部拓扑结构/全局时序依赖/多尺度特征融合/预测性能Key words
trajectory prediction/local topology structure/global temporal dependencies/multi-scale feature fusion/prediction performance分类
信息技术与安全科学