计算机工程与应用2023,Vol.59Issue(24):46-69,24.DOI:10.3778/j.issn.1002-8331.2302-0361
深度学习中文命名实体识别研究进展
Research Progress on Named Entity Recognition in Chinese Deep Learning
摘要
Abstract
Chinese named entity recognition(CNER)is the process of identifying and categorizing entities with specific meanings in Chinese text.It is a crucial component in many downstream tasks within natural language processing.In the past few years,deep learning technology has increasingly relied on end-to-end methods to automatically learn more com-plex and abstract data features,thereby reducing the need for manual annotation and addressing the issue of data sparsity in high-dimensional feature spaces.As a result,deep learning has emerged as the dominant approach for Chinese named entity recognition.This article initially provides an overview of the historical development of named entity recognition and outlines the specific challenges and intricacies associated with Chinese named entity recognition(CNER).It then delves into the distinct processing characteristics of CNER and categorizes deep learning-based methods for CNER into three key areas:flat entity boundary problem,Chinese nested named entity recognition,and CNER small sample problem.The paper offers a detailed description of the models,subdivisions,and recent research progress in each of these areas,and presents experimental results of several noteworthy deep learning methods on relevant datasets.Finally,the article identifies the challenges and future research directions for CNER,and concludes with a summary of commonly used datasets and evaluation methods for Chinese named entity recognition.关键词
中文命名实体识别/深度学习/实体边界/中文嵌套命名实体识别/低资源中文命名实体识别Key words
Chinese named entity recognition/deep learning/entity boundary/Chinese nested named entity recognition/low resource Chinese named entity recognition分类
信息技术与安全科学引用本文复制引用
李莉,奚雪峰,盛胜利,崔志明,徐家保..深度学习中文命名实体识别研究进展[J].计算机工程与应用,2023,59(24):46-69,24.基金项目
国家自然科学基金(61876217,62176175) (61876217,62176175)
江苏省"六大人才高峰"高层次人才项目(XYDXX-086) (XYDXX-086)
苏州市科技计划项目(SGC2021078). (SGC2021078)