地理空间信息2024,Vol.22Issue(8):59-63,88,6.DOI:10.3969/j.issn.1672-4623.2024.08.013
基于XLNet的多数据源中文地名匹配方法
Chinese Geographical Name Matching Method with Multiple Data Sources Based on XLNet
摘要
Abstract
Address,as an important fundamental data resource in social development,has become an essential component of urban geo-spatial da-ta construction.Geographical name matching aims to compare paired strings representing the same real-world location.Current geographical name matching methods rely on either independent string similarity or a combination of multiple similarity metrics,which fail to effectively cap-ture character substitutions involved in geographical name changes due to language and cultural variations.We proposed a geographical name matching method based on XLNet algorithm,which using a deep neural network to classify a pair of geographical name as match or non-match.The method based on long-term memory uses bidirectional information flow attention masks to reconstruct event sequences,establishing repre-sentations by using the bidirectional information of sequence.The experimental result demonstrates the effectiveness of this method in addressing the issue of lengthy address matching.The model can more comprehensively capture the semantic information conveyed within the context,which outperforms previous studies on single similarity metrics and supervised machine learning methods.关键词
地名匹配/地名实体/XLNet/Softmax/回归模型Key words
geographical name matching/geographical name entity/XLNet/Softmax regression model分类
天文与地球科学引用本文复制引用
郑诗语,邱芹军,谢忠,陶留锋,李伟杰..基于XLNet的多数据源中文地名匹配方法[J].地理空间信息,2024,22(8):59-63,88,6.基金项目
国家重点研发计划资助项目(2022YFB3904200、2022YFF0711601) (2022YFB3904200、2022YFF0711601)
湖北省自然科学基金资助项目(2022CFB640) (2022CFB640)
地质探测与评估教育部重点实验室主任基金资助项目(GLAB2023ZR01). (GLAB2023ZR01)