首页|期刊导航|自动化学报|统计与规则相结合的维吾尔语人名识别方法

统计与规则相结合的维吾尔语人名识别方法

塔什甫拉提·尼扎木丁汪昆艾斯卡尔·艾木都拉帕力旦·吐尔逊

自动化学报2017，Vol.43Issue(4)：653-664,12.

自动化学报2017，Vol.43Issue(4)：653-664,12.DOI:10.16383/j.aas.2017.c150769

统计与规则相结合的维吾尔语人名识别方法

Combination of Statistical and Rule-based Approaches for Uyghur Person Name Recognition

塔什甫拉提·尼扎木丁 ¹汪昆 ²艾斯卡尔·艾木都拉 ¹帕力旦·吐尔逊³

作者信息

1. 新疆大学信息科学与工程学院乌鲁木齐830046
2. 中国科学院自动化研究所模式识别国家重点实验室北京100190
3. 新疆大学软件学院乌鲁木齐830046
折叠

摘要

Abstract

Named entity recognition (NER) is an important subtask of natural language processing,where person name is one of the major objects.From agglutinative characteristics of the Uyghur language,we split a Uygur word into different level units such as syllable,suffix,stem,etc.,so as to significantly reduce the data sparse problem.Since the Han people name is the major remaining errors for the CRF (Conditional random field)-based approach,we also propose a rule-based post-processing approach for Han people name recognition in Uyghur language.Experimental results show that this cascade approach achieves satisfactory performance,and that the recognition accuracy,recall rate and F1 score are 87.47％、89.12 ％ and 88.29 ％,respectively.

关键词

维吾尔语/人名识别/条件随机场/音节库

Key words

Uyghur language processing/person name recognition/conditional random field (CRF)/syllable bank

引用本文复制引用

塔什甫拉提·尼扎木丁,汪昆,艾斯卡尔·艾木都拉,帕力旦·吐尔逊..统计与规则相结合的维吾尔语人名识别方法[J].自动化学报,2017,43(4):653-664,12.

基金项目

国家自然科学基金(61562081),新疆高技术研究发展计划(201312103)资助（61562081）

Supported by National Natural Science Foundation of China(61562081),Xinjiang High Technology Research and Development Program of China (201312103) （61562081）

自动化学报

OA北大核心CSCDCSTPCD

ISSN：0254-4156

访问量0

下载量0

段落导航