实验技术与管理2025,Vol.42Issue(9):27-33,7.DOI:10.16791/j.cnki.sjg.2025.09.005
基于多尺度掩码自编码的自监督点云场景流估计
Self-supervised scene flow estimation based on multiscale masked autoencoders
摘要
Abstract
[Objective]Point cloud scene flow plays an important role in the field of autonomous driving.However,improving the accuracy of scene flow estimation is difficult because of the point cloud characteristics,such as disorder and uneven density distribution.In most previously reported methods,the models were trained on synthetic datasets because of the difficulty and cost of acquiring accurate scene flow labels for point clouds;additionally,complex situations,such as occlusion in real scenes,were ignored.To address these problems,a self-supervised scene flow estimation method based on multiscale masked autoencoders is proposed.[Methods]The proposed model divides the input point cloud into irregular point patches,performs large-ratio random masking and token embedding,and then simulates the spatial geometry of the point cloud through an asymmetric encoder-decoder architecture.In the encoding stage,the mask token is shifted to the input of the autoencoder's decoder to prevent position information from leaking to the mask token;in this way,the encoder can focus on learning high-level latent features obtained from the unmasked point cloud.In the decoding stage,the learned latent features and mask tokens are utilized to reconstruct the original point cloud.In addition,the model fuses details and global context information through a pyramid architecture and adopts a multiscale masking strategy to ensure consistent visible areas during feature extraction at different scales.[Results]Experiments were conducted on the FlyingThings3D and KITTI datasets,and the model was trained in a self-supervised manner.The results of model training and testing on the FT3Do and KITTIo datasets,respectively,show that despite using only one-tenth of the instance data used in other methods during model training,the proposed method outperforms all the existing self-supervised and fully supervised methods.The results of model training and testing on the FT3Ds and KITTIs datasets,respectively,show that all indicators are significantly improved compared with the baseline,especially the EPE indicator,which is improved by 10.5%.In addition,single-scale and multiscale masked autoencoders were added to the baseline network to conduct ablation experiments.For the single-scale architecture,the EPE indicator improved by 5.3%compared with that of the baseline network;for the multiscale architecture,the improvement was 10.5%.[Conclusions]The results of ablation experiments prove that the pyramid architecture can effectively integrate multiscale information and extract rich geometric features.Comparative experiments with other methods show the superiority of the proposed method.The multiscale masked autoencoder extracts powerful features by randomly masking and reconstructing the original point cloud,thereby reducing the impact of point cloud disorder and occlusion on the accuracy of scene flow estimation.关键词
点云/场景流/掩码自编码/特征金字塔/自监督学习Key words
point cloud/scene flow/masked autoencoders/feature pyramid/self-supervised learning分类
信息技术与安全科学引用本文复制引用
项学智,王茜,王路,贲晛烨,乔玉龙..基于多尺度掩码自编码的自监督点云场景流估计[J].实验技术与管理,2025,42(9):27-33,7.基金项目
国家自然科学基金(62271160) (62271160)
黑龙江省高等教育教学改革研究重点项目(SJGZB2024054) (SJGZB2024054)
哈尔滨工程大学本科生教学改革研究课题(JG2023B0803) (JG2023B0803)
哈尔滨工程大学研究生教学改革研究课题(JG2022Y037) (JG2022Y037)
哈尔滨工程大学特色学科基础研究稳定支持专项(KYWZ220240812) (KYWZ220240812)
中央高校基本科研业务费专项资金项目(3072024LJ0803) (3072024LJ0803)