| 注册
首页|期刊导航|计算机应用与软件|基于链式条件随机场的中文分词改进方法

基于链式条件随机场的中文分词改进方法

徐浩煜 任智慧 施俊 周晗

计算机应用与软件2016,Vol.33Issue(12):211-213,233,4.
计算机应用与软件2016,Vol.33Issue(12):211-213,233,4.DOI:10.3969/j.issn.1000-386x.2016.12.050

基于链式条件随机场的中文分词改进方法

AN IMPROVED CHINESE WORD SEGMENTATION METHOD BASED ON CHAIN CONDITIONAL RANDOM FIELDS

徐浩煜 1任智慧 2施俊 3周晗1

作者信息

  • 1. 中国科学院上海高等研究院航空通讯技术联合实验室 上海 201210
  • 2. 中国科学院大学 北京 100049
  • 3. 上海大学通信与信息工程学院 上海 200444
  • 折叠

摘要

Abstract

With the development of Chinese word segmentation evaluation Bakeoff,the word-position-based tagging Chinese word segmentation approaches based on chain conditional random fields have been widely used.For the training of CRF models,it is essential to select the tone tag set and feature template.However,the researches in the literature generally used single tag set or feature template,lacking of frequently-used tag sets and feature templates in combination,which resulted in out-of-vocabulary rate at a low level and influenced the performance of word segmentation on Internet corpuses.This method firstly combines six-tag set and feature template TMPT-10 and TMPT-10`, dealing with comparative experiments with frequently-used tag sets and feature templates on the Bakeoff corpuses.The results demonstrate that the improved method 6tag-tmpt10 can reach higher rate of out-of-vocabulary word recall compared with other methods,which can improve the performance of Chinese word segmentation in Internet field,in the meanwhile can get the comparative F1-score.

关键词

中文分词/词位标注/条件随机场/特征模板

Key words

Chinese word segmentation/Word-position tagging/Conditional random field/Feature template

分类

信息技术与安全科学

引用本文复制引用

徐浩煜,任智慧,施俊,周晗..基于链式条件随机场的中文分词改进方法[J].计算机应用与软件,2016,33(12):211-213,233,4.

基金项目

国家自然科学基金项目(61471231)。 ()

计算机应用与软件

OACSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文