航空学报2026,Vol.47Issue(10):139-158,20.DOI:10.7527/S1000-6893.2025.32630
基于视觉-语言预训练模型的高光谱-激光雷达联合分类方法
Hyperspectral-LiDAR joint classification method based on vision-language pre-trained models
摘要
Abstract
To address the challenge of inaccurate land-cover classification caused by differences in spatial resolution,data heterogeneity,and limited labeled samples in multimodal remote sensing data,we investigate the joint classifica-tion of Hyperspectral Imagery(HSI)and LiDAR data.We propose a Semantic-aware Cross-modal Fusion Network(SCF-Net).First,a lightweight patch encoder transforms the input data into RGB-compatible feature maps,which are then fed into a Contrastive Language-Image Pre-training(CLIP)-based visual encoder enhanced with learnable prompts.To efficiently integrate multimodal information,an adaptive cross-modal fusion architecture is employed,fea-turing grouped linear projection and a relation-aware interaction module that enables dynamic spatial feature exchange at low computational cost.For semantic discrimination,attribute-category textual prompts are generated,and classifi-cation is performed by computing the cosine similarity between visual and textual embeddings,followed by a TopK at-tribute averaging strategy.Experiments on the Houston 2013,MUUFL,and Trento datasets demonstrate that SCF-Net outperforms eight state-of-the-art fusion methods,achieving improvements of over 2.88%in overall accuracy,2.69%in average accuracy,and 3.02%in Kappa coefficient,while maintaining high parameter efficiency.Ablation studies further validate the effectiveness of each component.This network offers a novel paradigm for integrating multi-modal remote sensing data with large-scale vision-language pre-trained models in complex classification tasks.关键词
多模态遥感数据/土地覆盖分类/语义感知跨模态融合网络/自适应跨模态融合/视觉-语言预训练模型/高光谱-激光雷达融合Key words
multi-modal remote sensing data/land cover classification/semantic-aware cross-modal fusion network/adaptive cross-modal fusion/vision-language pre-trained model/Hyperspectral-LiDAR fusion分类
航空航天引用本文复制引用
唐旭,谷峰,马晶晶,张向荣..基于视觉-语言预训练模型的高光谱-激光雷达联合分类方法[J].航空学报,2026,47(10):139-158,20.基金项目
国家自然科学基金面上项目(62571387) (62571387)
中央高校基本科研业务费专项资金(YJSJ25014) General Program of National Natural Science Foundation of China(62571387) (YJSJ25014)
the Fundamental Research Funds for the Central Universities(YJSJ25014) (YJSJ25014)