中国城市林业2026,Vol.24Issue(1):9-19,11.DOI:10.12169/zgcsly.2026.01.01.0001
基于多模态视觉语言模型的老旧社区环境特征解析
Decoding Environmental Characteristics of Old Urban Residential Compounds Via Multi-modal Vision-Language Models:A Case Study of the City Core of Chongqing
摘要
Abstract
[Objective]To address the limitations of traditional methods in identifying environmental characteristics of old urban residential compounds,including low efficiency and susceptibility to subjective factors,this study proposes an innovative methodological framework integrating Multi-modal Vision-Language Models(VLM)with Geotagged Crowdsourced Imagery to have the environmental characteristics of old urban residential compounds decoded and interpreted,which is then quantified and classified.[Method]With the city core of Chongqing taken as the empirical study area,a dataset of crowdsourced images is constructed.VLMs are utilized to transcode visual semantics into textual descriptions,which are then processed using BERTopic modeling for clustering analysis.The resulting clusters are mapped onto geographic space to analyze the spatial distribution and co-occurrence patterns of environmental characteristics.[Result]The extraction of 50 clusters yields 17 thematic groups of environmental perception features,covering spatial skeletons,micro-scale details,greenery forms,and place ambiance.Furthermore,7 typical spatial co-occurrence patterns of these characteristics are identified,providing suggestions for spatially targeted renovation.[Conclusion]The environments of old urban residential compounds exhibit complex heterogeneity,profoundly influenced by historical features and public life.The multi-modal analysis framework effectively enables semantic understanding of environmental images,achieving low-cost,high-throughput,and fine-grained feature mining.Future research can further integrate sociological data to build a complete loop of renewal,truly serving the systematic construction of"good houses,good neighborhoods,good communities,and good urban districts".关键词
多模态视觉语言模型/老旧社区/众源图像/空间特征/重庆市中心城区Key words
multimodal vision-language model/old urban residential compound/crowd-sourced imagery/spatial feature/city core of Chongqing引用本文复制引用
李彦锦,罗丹,肖竞,李玮..基于多模态视觉语言模型的老旧社区环境特征解析[J].中国城市林业,2026,24(1):9-19,11.基金项目
国家自然科学基金重点项目(52238003) (52238003)
重庆市社会科学青年项目(2021NDQN64) (2021NDQN64)