| 注册
首页|期刊导航|航空学报|基于视觉-语言预训练模型的高光谱-激光雷达联合分类方法

基于视觉-语言预训练模型的高光谱-激光雷达联合分类方法

唐旭 谷峰 马晶晶 张向荣

航空学报2026,Vol.47Issue(10):139-158,20.
航空学报2026,Vol.47Issue(10):139-158,20.DOI:10.7527/S1000-6893.2025.32630

基于视觉-语言预训练模型的高光谱-激光雷达联合分类方法

Hyperspectral-LiDAR joint classification method based on vision-language pre-trained models

唐旭 1谷峰 1马晶晶 1张向荣1

作者信息

  • 1. 西安电子科技大学 人工智能学院,西安 710126
  • 折叠

摘要

Abstract

To address the challenge of inaccurate land-cover classification caused by differences in spatial resolution,data heterogeneity,and limited labeled samples in multimodal remote sensing data,we investigate the joint classifica-tion of Hyperspectral Imagery(HSI)and LiDAR data.We propose a Semantic-aware Cross-modal Fusion Network(SCF-Net).First,a lightweight patch encoder transforms the input data into RGB-compatible feature maps,which are then fed into a Contrastive Language-Image Pre-training(CLIP)-based visual encoder enhanced with learnable prompts.To efficiently integrate multimodal information,an adaptive cross-modal fusion architecture is employed,fea-turing grouped linear projection and a relation-aware interaction module that enables dynamic spatial feature exchange at low computational cost.For semantic discrimination,attribute-category textual prompts are generated,and classifi-cation is performed by computing the cosine similarity between visual and textual embeddings,followed by a TopK at-tribute averaging strategy.Experiments on the Houston 2013,MUUFL,and Trento datasets demonstrate that SCF-Net outperforms eight state-of-the-art fusion methods,achieving improvements of over 2.88%in overall accuracy,2.69%in average accuracy,and 3.02%in Kappa coefficient,while maintaining high parameter efficiency.Ablation studies further validate the effectiveness of each component.This network offers a novel paradigm for integrating multi-modal remote sensing data with large-scale vision-language pre-trained models in complex classification tasks.

关键词

多模态遥感数据/土地覆盖分类/语义感知跨模态融合网络/自适应跨模态融合/视觉-语言预训练模型/高光谱-激光雷达融合

Key words

multi-modal remote sensing data/land cover classification/semantic-aware cross-modal fusion network/adaptive cross-modal fusion/vision-language pre-trained model/Hyperspectral-LiDAR fusion

分类

航空航天

引用本文复制引用

唐旭,谷峰,马晶晶,张向荣..基于视觉-语言预训练模型的高光谱-激光雷达联合分类方法[J].航空学报,2026,47(10):139-158,20.

基金项目

国家自然科学基金面上项目(62571387) (62571387)

中央高校基本科研业务费专项资金(YJSJ25014) General Program of National Natural Science Foundation of China(62571387) (YJSJ25014)

the Fundamental Research Funds for the Central Universities(YJSJ25014) (YJSJ25014)

航空学报

1000-6893

访问量0
|
下载量0
段落导航相关论文