首页|期刊导航|计算机工程|基于改进Vision Transformer的局部光照一致性估计

基于改进Vision Transformer的局部光照一致性估计

王杨宋世佳王鹤琴袁振羽赵立军吴其林

计算机工程2025，Vol.51Issue(2)：312-321,10.

计算机工程2025，Vol.51Issue(2)：312-321,10.DOI:10.19678/j.issn.1000-3428.0068905

基于改进Vision Transformer的局部光照一致性估计

Estimation of Local Illumination Consistency Based on Improved Vision Transformer

王杨 ¹宋世佳 ¹王鹤琴 ¹袁振羽 ¹赵立军 ²吴其林¹

作者信息

1. 安徽师范大学计算机与信息学院,安徽芜湖 241000
2. 长三角哈特机器人产业技术研究院,安徽芜湖 241000
折叠

摘要

Abstract

Illumination consistency is a key factor in achieving the organic fusion of virtual and real elements in Augmented Reality(AR)systems.Owing to the constraints of capture perspectives and the complexity of scene illumination,developers often overlook local illumination consistency when estimating panoramic lighting information,thereby affecting the final rendering quality.To address this issue,this study proposes a local illumination consistency estimation framework,ViTLight,based on an improved Vision Transformer(ViT)structure.First,the framework uses a ViT encoder to extract feature vectors and calculate regression Spherical Harmonic(SH)coefficients,then recovers illumination information.Second,the ViT encoder structure is enhanced by introducing a multi-head self-attention interaction mechanism.Convolution operation guides the interplay between attention heads.Additionally,a local perception module is integrated to actively scan each image block and perform weighted summation on local pixels to capture specific features within regions.This proactive approach balances global contextual features and local illumination information,ultimately improving the precision of illumination estimation.The mainstream feature extraction network and four classical illumination estimation frameworks are compared on public datasets.The experimental results and analysis indicate that ViTLight is superior to existing frameworks in terms of image rendering accuracy,and its Root Mean Square Error(RMSE)and Structural Dissimilarity(DSSIM)index reach 0.129 6 and 0.042 6,respectively,which verifies its effectiveness and correctness.

关键词

增强现实/光照估计/球面谐波系数/视觉Transformer/多头自注意力

Key words

Augmented Reality(AR)/illumination estimation/Spherical Harmonics(SH)coefficient/Vision Transformer(ViT)/multi-head self-attention

分类

计算机与自动化

引用本文复制引用

王杨,宋世佳,王鹤琴,袁振羽,赵立军,吴其林..基于改进Vision Transformer的局部光照一致性估计[J].计算机工程,2025,51(2):312-321,10.

基金项目

国家自然科学基金(61871412) （61871412）

安徽省自然科学基金重点项目(KJ2019A0938,KJ2021A1314,KJ2019A0979) （KJ2019A0938,KJ2021A1314,KJ2019A0979）

安徽高校自然科学重点项目(2022AH052899,KJ2019A0979,KJ2019A0511,2023AH052757) （2022AH052899,KJ2019A0979,KJ2019A0511,2023AH052757）

机器视觉检测安徽省重点实验室开放课题(KLMVI-2023-HIT-11) （KLMVI-2023-HIT-11）

安徽省高校学科(专业)拔尖人才学术项目(gxbjZD2022147). （专业）

计算机工程

OA北大核心

ISSN：1000-3428

访问量0

下载量0

段落导航