首页|期刊导航|计算机工程与应用|基于N元模型的维吾尔语词性标注实验研究

基于N元模型的维吾尔语词性标注实验研究

尼加提·纳吉米买合木提·买买提吐尔根·依布拉音

计算机工程与应用2012，Vol.48Issue(25)：137-140,173,5.

计算机工程与应用2012，Vol.48Issue(25)：137-140,173,5.DOI:10.3778/j.issn.1002-8331.2012.25.029

基于N元模型的维吾尔语词性标注实验研究

Experimental study of N-gram based Uyghur part of speech tagging

尼加提·纳吉米 ¹买合木提·买买提 ²吐尔根·依布拉音³

作者信息

1. 华北电力大学,北京102206
2. 新疆电力信息通信有限责任公司,乌鲁木齐830026
3. 新疆信息产业有限责任公司,乌鲁木齐830026
折叠

摘要

Abstract

There are many approaches to the problem of part-of-speech tagging, current Uyghur part-of-speech tagging is mainly based on rule based methods and does not achieve the state-of-art accuracy. A large scale of manually annotated Uyghur corpus and a number of well-conducted experiments are used to identify the efficiency of TV-gram based part-of-speech tagging scheme for Uyghur texts. The TV-gram language model parameters and data smoothing are analyzed, and the efficiency of Bigram and Trigram models are compared. The impacts of tag sets and size of training data on tagging accuracy are studied. The experiments show that TV-gram based part-of-speech tagging for Uyghur texts has achieved good results.

关键词

词性标注/N元模型/维吾尔语词性标注

Key words

part-of-speech tagging/ TV-gram model/ Uyghur part -of-speech tagging

分类

信息技术与安全科学

引用本文复制引用

尼加提·纳吉米,买合木提·买买提,吐尔根·依布拉音..基于N元模型的维吾尔语词性标注实验研究[J].计算机工程与应用,2012,48(25):137-140,173,5.

基金项目

国家电子信息产业发展基金(文号:财建[2009]537,工信部财[2009]453) （文号:财建[2009]537,工信部财[2009]453）

国家自然科学基金(No.60963018,No.61063026) （No.60963018,No.61063026）

国家教育部项目(No.MZ115-75) （No.MZ115-75）

新疆维吾尔自治区高新技术项目(No.200712109) （No.200712109）

新疆维吾尔自治区高校项目(No.XJEDU2008I08) （No.XJEDU2008I08）

新疆多语种信息技术重点实验室开放课题. （）

计算机工程与应用

OACSCDCSTPCD

ISSN：1002-8331

访问量0

下载量0

段落导航