| 注册
首页|期刊导航|计算机工程与科学|融合上下文字符信息的泰语神经网络分词方法

融合上下文字符信息的泰语神经网络分词方法

陶广奉 线岩团 王红斌 汪淑娟

计算机工程与科学2018,Vol.40Issue(5):943-949,7.
计算机工程与科学2018,Vol.40Issue(5):943-949,7.DOI:10.3969/j.issn.1007-130X.2018.05.025

融合上下文字符信息的泰语神经网络分词方法

A context character feature based neural network model for Thai word segmentation

陶广奉 1线岩团 1王红斌 1汪淑娟1

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,云南昆明 650500
  • 折叠

摘要

Abstract

Automatic word segmentation is a fundamental technology of natural language processing.Aiming at the problem of complex feature template and large search space in the traditional Thai word segmentation method,this paper proposes a context character feature based neural network model for Thai word segmentation.The proposed model uses the word distribution table to train the word representation vector,and utilizes a multi-layer neural network classifier for Thai word segmentation.Experimental results on InterBEST 2009 Thai word evaluation corpus show that,compared with the conditional random field model,the Character-Cluster Hybrid segmentation model,and the GLR and N-gram segmentation model,our proposal achieves better performance.Word segmentation accuracy,recall ratio and F value reach 97.27%,99.26% and 98.26%,respectively.Our model improves the segmentation speed by 112.78% in comparison to the conditional random field model.

关键词

泰语分词/神经网络模型/上下文字符信息/字符向量

Key words

Thai word segmentation/neural network model/context character feature/characters vector

分类

信息技术与安全科学

引用本文复制引用

陶广奉,线岩团,王红斌,汪淑娟..融合上下文字符信息的泰语神经网络分词方法[J].计算机工程与科学,2018,40(5):943-949,7.

基金项目

国家自然科学基金(61363044,61462054) (61363044,61462054)

云南省科技厅面上项目(2015FB135) (2015FB135)

云南省教育厅科学研究基金(2014Z021) (2014Z021)

计算机工程与科学

OA北大核心CSCDCSTPCD

1007-130X

访问量0
|
下载量0
段落导航相关论文