地理空间信息2025,Vol.23Issue(2):1-6,6.DOI:10.3969/j.issn.1672-4623.2025.02.001
利用扩展词嵌入BERT的地表水系地理命名实体抽取模型
Geographical Named Entity Extraction Model of Surface Water System Based on Expanded Word Embedding BERT
摘要
Abstract
One of the important tasks in constructing a geographical knowledge graph is the recognition of geographical named entities.Chinese text has flexible vocabulary structures and unclear word boundaries,making the recognition of geographical named entities in Chinese text a chal-lenging research area,especially due to the scarcity of annotated datasets in the geographical domain.To address the task of geographical named entities recognition in massive network texts containing geographical information,we established a dataset of surface water system based on Wikipedia data and a domain dictionary,and proposed a vocabulary enhancement method based on expanded word embedding to enhance the vo-cabulary of BERT pre-training model.We constructed EXPBERT-BiGRU-CRF named entity recognition model by combining BiGRU and CRF networks for context feature recognition and learning.Experimental results show that this model achieves F1_score of 95.94%on the surface wa-ter system dataset,which is a 4.94%improvement compared to the BERT model without vocabulary enhancement,along with significant accura-cy improvements compared to other models,and can accurately identify geographical named entities.关键词
地理知识图谱/BERT/命名实体识别/词汇增强Key words
geographical knowledge graph/BERT/named entity recognition/vocabulary enhancement分类
天文与地球科学引用本文复制引用
郑旭野,陈涛,周婧娟..利用扩展词嵌入BERT的地表水系地理命名实体抽取模型[J].地理空间信息,2025,23(2):1-6,6.基金项目
湖北省自然科学基金资助项目(2022CFB194). (2022CFB194)