计算机工程与科学2018,Vol.40Issue(5):943-949,7.DOI:10.3969/j.issn.1007-130X.2018.05.025
融合上下文字符信息的泰语神经网络分词方法
A context character feature based neural network model for Thai word segmentation
摘要
Abstract
Automatic word segmentation is a fundamental technology of natural language processing.Aiming at the problem of complex feature template and large search space in the traditional Thai word segmentation method,this paper proposes a context character feature based neural network model for Thai word segmentation.The proposed model uses the word distribution table to train the word representation vector,and utilizes a multi-layer neural network classifier for Thai word segmentation.Experimental results on InterBEST 2009 Thai word evaluation corpus show that,compared with the conditional random field model,the Character-Cluster Hybrid segmentation model,and the GLR and N-gram segmentation model,our proposal achieves better performance.Word segmentation accuracy,recall ratio and F value reach 97.27%,99.26% and 98.26%,respectively.Our model improves the segmentation speed by 112.78% in comparison to the conditional random field model.关键词
泰语分词/神经网络模型/上下文字符信息/字符向量Key words
Thai word segmentation/neural network model/context character feature/characters vector分类
信息技术与安全科学引用本文复制引用
陶广奉,线岩团,王红斌,汪淑娟..融合上下文字符信息的泰语神经网络分词方法[J].计算机工程与科学,2018,40(5):943-949,7.基金项目
国家自然科学基金(61363044,61462054) (61363044,61462054)
云南省科技厅面上项目(2015FB135) (2015FB135)
云南省教育厅科学研究基金(2014Z021) (2014Z021)