首页|期刊导航|中国城市林业|基于多模态视觉语言模型的老旧社区环境特征解析

基于多模态视觉语言模型的老旧社区环境特征解析

李彦锦罗丹肖竞李玮

中国城市林业2026，Vol.24Issue(1)：9-19,11.

中国城市林业2026，Vol.24Issue(1)：9-19,11.DOI:10.12169/zgcsly.2026.01.01.0001

基于多模态视觉语言模型的老旧社区环境特征解析

Decoding Environmental Characteristics of Old Urban Residential Compounds Via Multi-modal Vision-Language Models:A Case Study of the City Core of Chongqing

李彦锦 ¹罗丹 ¹肖竞 ¹李玮²

作者信息

1. 重庆大学建筑城规学院,重庆 400045
2. 长春中海地产有限公司,长春 130000
折叠

摘要

Abstract

[Objective]To address the limitations of traditional methods in identifying environmental characteristics of old urban residential compounds,including low efficiency and susceptibility to subjective factors,this study proposes an innovative methodological framework integrating Multi-modal Vision-Language Models(VLM)with Geotagged Crowdsourced Imagery to have the environmental characteristics of old urban residential compounds decoded and interpreted,which is then quantified and classified.[Method]With the city core of Chongqing taken as the empirical study area,a dataset of crowdsourced images is constructed.VLMs are utilized to transcode visual semantics into textual descriptions,which are then processed using BERTopic modeling for clustering analysis.The resulting clusters are mapped onto geographic space to analyze the spatial distribution and co-occurrence patterns of environmental characteristics.[Result]The extraction of 50 clusters yields 17 thematic groups of environmental perception features,covering spatial skeletons,micro-scale details,greenery forms,and place ambiance.Furthermore,7 typical spatial co-occurrence patterns of these characteristics are identified,providing suggestions for spatially targeted renovation.[Conclusion]The environments of old urban residential compounds exhibit complex heterogeneity,profoundly influenced by historical features and public life.The multi-modal analysis framework effectively enables semantic understanding of environmental images,achieving low-cost,high-throughput,and fine-grained feature mining.Future research can further integrate sociological data to build a complete loop of renewal,truly serving the systematic construction of"good houses,good neighborhoods,good communities,and good urban districts".

关键词

多模态视觉语言模型/老旧社区/众源图像/空间特征/重庆市中心城区

Key words

multimodal vision-language model/old urban residential compound/crowd-sourced imagery/spatial feature/city core of Chongqing

引用本文复制引用

李彦锦,罗丹,肖竞,李玮..基于多模态视觉语言模型的老旧社区环境特征解析[J].中国城市林业,2026,24(1):9-19,11.

基金项目

国家自然科学基金重点项目(52238003) （52238003）

重庆市社会科学青年项目(2021NDQN64) （2021NDQN64）

中国城市林业

ISSN：1672-4925

访问量0

下载量0

段落导航