计算机应用与软件2016,Vol.33Issue(12):211-213,233,4.DOI:10.3969/j.issn.1000-386x.2016.12.050
基于链式条件随机场的中文分词改进方法
AN IMPROVED CHINESE WORD SEGMENTATION METHOD BASED ON CHAIN CONDITIONAL RANDOM FIELDS
摘要
Abstract
With the development of Chinese word segmentation evaluation Bakeoff,the word-position-based tagging Chinese word segmentation approaches based on chain conditional random fields have been widely used.For the training of CRF models,it is essential to select the tone tag set and feature template.However,the researches in the literature generally used single tag set or feature template,lacking of frequently-used tag sets and feature templates in combination,which resulted in out-of-vocabulary rate at a low level and influenced the performance of word segmentation on Internet corpuses.This method firstly combines six-tag set and feature template TMPT-10 and TMPT-10`, dealing with comparative experiments with frequently-used tag sets and feature templates on the Bakeoff corpuses.The results demonstrate that the improved method 6tag-tmpt10 can reach higher rate of out-of-vocabulary word recall compared with other methods,which can improve the performance of Chinese word segmentation in Internet field,in the meanwhile can get the comparative F1-score.关键词
中文分词/词位标注/条件随机场/特征模板Key words
Chinese word segmentation/Word-position tagging/Conditional random field/Feature template分类
信息技术与安全科学引用本文复制引用
徐浩煜,任智慧,施俊,周晗..基于链式条件随机场的中文分词改进方法[J].计算机应用与软件,2016,33(12):211-213,233,4.基金项目
国家自然科学基金项目(61471231)。 ()