| 注册
首页|期刊导航|南京大学学报(自然科学版)|基于条件随机场的藏文人名识别技术研究

基于条件随机场的藏文人名识别技术研究

珠杰 李天瑞 刘胜久

南京大学学报(自然科学版)2016,Vol.52Issue(2):289-299,11.
南京大学学报(自然科学版)2016,Vol.52Issue(2):289-299,11.DOI:10.13232/j.cnki.jnju.2016.02.010

基于条件随机场的藏文人名识别技术研究

Research on Tibetan name recognition technology under CRF

珠杰 1李天瑞 2刘胜久1

作者信息

  • 1. 西南交通大学信息科学与技术学院,成都,610031
  • 2. 西藏大学计算机科学系,拉萨,850000
  • 折叠

摘要

Abstract

Named entity recognition is an important research content in text mining.It has a high recognition rate by use of statistical principle.This paper studies Tibetan name recognition technology using conditional random fields (CRF)principle,focuses on analysis of the internal structure of the Tibetan names,contextual features,feature selection and data preprocessing,etc.and evaluates the effectiveness of different features through experiments.The contributions of this paper are that the method of name recognition based on the information of word(syllable)and word position is firstly presented;trigger words,function words,dictionary of names and personal noun suffix as features,together with their different combinations and optimization are studied,and the role of the different function words to the name recognition is refined.Experimental evaluation on different combinations showed that:1 )the features of trigger words and ergative particle can play a positive role on the Tibetan name recognition;2)different feature window sizes have an impact on the name recognition;3)the recognition rate of Tibetan names can reach 80%of F 1 value by use of CRF.However,it can’t reach similar recognition results in other languages due to the high am-biguity of words consisting of two Tibetan syllables.

关键词

藏文人名/条件随机场(CRF)/特征选择

Key words

Tibetan names/conditional random field(CRF)/feature selection

分类

信息技术与安全科学

引用本文复制引用

珠杰,李天瑞,刘胜久..基于条件随机场的藏文人名识别技术研究[J].南京大学学报(自然科学版),2016,52(2):289-299,11.

基金项目

国家自然科学基金(61262058) (61262058)

南京大学学报(自然科学版)

OACSCDCSTPCD

0469-5097

访问量0
|
下载量0
段落导航相关论文