| 注册
首页|期刊导航|计算机应用研究|基于分层选择策略的主动学习分词方法

基于分层选择策略的主动学习分词方法

梁喜涛 顾磊

计算机应用研究Issue(5):1353-1356,4.
计算机应用研究Issue(5):1353-1356,4.DOI:10.3969/j.issn.1001-3695.2015.05.018

基于分层选择策略的主动学习分词方法

Active learning in Chinese word segmentation based on stratified sampling strategy

梁喜涛 1顾磊1

作者信息

  • 1. 南京邮电大学 计算机学院,南京 210003
  • 折叠

摘要

Abstract

To solve the problems of lacking of training samples and accessing a large number of labeled samples laborious,this paper proposed one fresh active learning segmentation method based on stratified sampling strategy.The method used the strat-ified sampling strategy to select the most useful instances to annotate from unlabeled samples.Next,it put the annotated in-stances into the labeled set and then trained the segmenter using the set.Finally the method tested in PKU,MSR and Shanxi university corpora and compared with the uncertainty sampling strategy .The experimental result shows that the stratified selec-tion strategy can improve the accuracy of segmentation in the same size training corpus,at the same time reduce the cost of manual annotation effectively.

关键词

中文分词/主动学习/不确定性取样/分层取样策略

Key words

Chinese word segmentation/active learning/uncertainty sampling/stratified sampling strategy

分类

信息技术与安全科学

引用本文复制引用

梁喜涛,顾磊..基于分层选择策略的主动学习分词方法[J].计算机应用研究,2015,(5):1353-1356,4.

基金项目

国家自然科学基金资助项目(61302157);国家教育部人文社会科学研究青年基金资助项目(12YJC870008);江苏省教育厅高校哲学社会科学基金资助项目(2013SJB870004);江苏省社科研究文化精品课题 ()

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文