自动化学报2017,Vol.43Issue(4):653-664,12.DOI:10.16383/j.aas.2017.c150769
统计与规则相结合的维吾尔语人名识别方法
Combination of Statistical and Rule-based Approaches for Uyghur Person Name Recognition
摘要
Abstract
Named entity recognition (NER) is an important subtask of natural language processing,where person name is one of the major objects.From agglutinative characteristics of the Uyghur language,we split a Uygur word into different level units such as syllable,suffix,stem,etc.,so as to significantly reduce the data sparse problem.Since the Han people name is the major remaining errors for the CRF (Conditional random field)-based approach,we also propose a rule-based post-processing approach for Han people name recognition in Uyghur language.Experimental results show that this cascade approach achieves satisfactory performance,and that the recognition accuracy,recall rate and F1 score are 87.47%、89.12 % and 88.29 %,respectively.关键词
维吾尔语/人名识别/条件随机场/音节库Key words
Uyghur language processing/person name recognition/conditional random field (CRF)/syllable bank引用本文复制引用
塔什甫拉提·尼扎木丁,汪昆,艾斯卡尔·艾木都拉,帕力旦·吐尔逊..统计与规则相结合的维吾尔语人名识别方法[J].自动化学报,2017,43(4):653-664,12.基金项目
国家自然科学基金(61562081),新疆高技术研究发展计划(201312103)资助 (61562081)
Supported by National Natural Science Foundation of China(61562081),Xinjiang High Technology Research and Development Program of China (201312103) (61562081)