清华大学学报(自然科学版)2025,Vol.65Issue(7):1197-1208,12.DOI:10.16511/j.cnki.qhdxxb.2025.26.033
基于多源异构数据的CIM分级分类语义网构建
Construction of the CIM hierarchical classification semantic network based on multi-source heterogeneous data
摘要
Abstract
[Objective]The city information model(CIM)is a new city information synthesis that combines large amounts of information to guide the construction of urban organisms through the digital representation of urban objects.However,because of the complex application requirements and the imperfect theoretical system of CIM,the problems of a lack of semantic specification and a unified framework need to be addressed,and the development of CIM is difficult to effectively promote.To form a universal semantic standard,this study proposes the CIM classification semantic web to guide the construction of the CIM and govern the CIM data.[Methods]This study designs a CIM classification semantic tree via the line classification method based on various criteria,uses the robustly optimized bidirectional encoder representations from transformers pretraining approach(RoBERTa)model and a clustering algorithm to merge the synonyms of the same cluster,and adds new semantic knowledge to optimize the semantic tree.To further expand the application of CIM semantics,the city information model ontology(CIMO)is proposed based on the completed semantic tree and the Stanford seven-step method,which has six main classes and nine main property attributes.CIMO enables computers to process semantic information effectively.Moreover,given the multisource data feature within the CIM,this study aims to fully leverage semantic information derived from building information modeling and geographic information systems.This study designs a mapping relationship between CIMO and multisource heterogeneous data,which are composed of industry foundation classes and city geography markup language.The CIMO serves as a foundation for semantic analysis and the construction of knowledge graphs.This study proposes coding attributes as unique identifiers for management analysis,which improves the efficiency and accuracy of the CIM classification semantic net.The models of one primary school building and surrounding municipal facilities are selected as a case study to further evaluate the semantic-tree-based CIMO and knowledge graph data governance.[Results]Based on mapping rules,this model could well form a triplet formal knowledge graph,and the resulting web ontology language could be understood and processed by computers.The semantic analysis could be completed based on the semantic web rule language(SWRL),and the logic test could be completed by an inference machine.The CIM classification semantic web that passed the test could identify the relationships between the instances and the instance categories contained in the query class by level and classification,could operate and update the instance data to complete logical and point-to-point query and governance,and had excellent data governance performance and clear semantic logic.The triplet file of the knowledge graph was imported into the graph database for graph visualization and graph data storage,which is convenient for intuitive understanding and processing of the graph data.[Conclusions]The CIM classification semantic web proposed in this study can satisfy the construction of multiscenario,multiprecision,and multilevel CIM systems;can standardize the semantic expression of CIM;has good hierarchical logic and data processing functions;serves as a semantic standard,and integrates city-level and component-level data.The semantic web can provide guidance and a framework for the development of the CIM data governance platform and the modeling of city-level models and can promote the construction and development of the geometric and semantic integration of the CIM comprehensive system.关键词
城市信息模型/分级分类/多源异构数据/本体论/知识图谱Key words
city information model/hierarchical classification/multi-source heterogeneous data/ontology/knowledge graph分类
信息技术与安全科学引用本文复制引用
徐照,官文鑫,张赣,方卓祯,蔡伟浪..基于多源异构数据的CIM分级分类语义网构建[J].清华大学学报(自然科学版),2025,65(7):1197-1208,12.基金项目
国家重点研发计划项目(2022YFC3803600) (2022YFC3803600)
国家自然科学基金面上项目(72071043) (72071043)