首页|期刊导航|计算机工程与科学|藏文自动分词系统的设计

藏文自动分词系统的设计

才智杰才让卓玛

计算机工程与科学2011，Vol.33Issue(5)：151-154,4.

计算机工程与科学2011，Vol.33Issue(5)：151-154,4.DOI:10.3969/j.issn.1007-130X.2011.05.030

藏文自动分词系统的设计

Design of a Tibetan Word Segmentation System

才智杰 ¹才让卓玛¹

作者信息

1. 青海师范大学藏文信息处理省部共建教育部重点实验室,青海西宁,810008
折叠

摘要

Abstract

As the fundamental linguistic knowledge base, human-annotated corpora are the basis of many statistical natural language processing tasks. Along with the wide use of statistical methods in natural language processing, corpus construction becomes an important research area. Word segmentation is necessary prerequisite of syntax parsing; its performance determines the parsing accuracy in a large degree. By the statistical analysis on a Tibetan corpus with 850,000 bytes, we first investigate the distribution and the syntactic function of Tibetan words, introduce a dictionary-based Tibetan word segmentation model, and then present the dictionary structure, case-auxiliary blocking and restoring algorithms which are necessary to Tibetan word segmentation. The development of the Tibetan word segmentation system also facilitates the research of the Tibetan word input methods, the Tibetan electronic dictionary construction, the Tibetan word frequency statistics, the design and realization of the search engine, the development of the machine translation system, the security of the network information, the construction of the Tibetan corpus, and the Tibetan semantic analysis.

关键词

中文信息处理/语料库/藏文分词

Key words

Chinese information processing/ corpus / Tibetan word segmentation

分类

信息技术与安全科学

引用本文复制引用

才智杰,才让卓玛..藏文自动分词系统的设计[J].计算机工程与科学,2011,33(5):151-154,4.

基金项目

科技部973前期预研项目(2010CB334708) （2010CB334708）

国家社会科学基金项目(09XYY024,07BYY035) （09XYY024,07BYY035）

国家语委项目(MZ05·118) （MZ05·118）

青海师范大学科研创新计划项目（）

青海师范大学中青年科研基金项目（）

计算机工程与科学

OA北大核心CSCDCSTPCD

ISSN：1007-130X

访问量0

下载量0

段落导航