郑州大学学报(理学版)2011,Vol.43Issue(1):70-74,5.
基于最大熵模型的词位标注汉语分词
Chinese Word Segmentation via Word-position Tagging Based on Maximum Entropy Model
摘要
Abstract
The performance of Chinese word segmentation has been greatly improved by word-position-based approaches in recent years.This approach treated Chinese word segmentation as a word-position tagging.With the help of powerful sequence tagging model, word-position-based method quickly rose as a mainstream technique in this field.Feature template selection and tag sets selection was crucial in this method.The technique was studied via using different word-positions tag sets and maximum entropy model.Closed evaluations were performed on corpus from the second international Chinese word segmentation Bakeoff-2005, and comparative experiments were performed on different tag sets and feature templates.Experimental results showed that the feature template set TMPT-6 and six word-position tag sets was much better than the other.关键词
汉语分词/词位标注/最大熵模型/词位标注集/特征模板Key words
Chinese word segmentation/ word-position tagging/ maximum entropy model/ word-position tag sets/ feature template分类
信息技术与安全科学引用本文复制引用
于江德,王希杰,樊孝忠..基于最大熵模型的词位标注汉语分词[J].郑州大学学报(理学版),2011,43(1):70-74,5.基金项目
高等学校博士点项目,编号20050007023 ()
河南省高等学校青年骨干教师项目,编号2009GGJS-108. ()