计算机应用研究Issue(5):1353-1356,4.DOI:10.3969/j.issn.1001-3695.2015.05.018
基于分层选择策略的主动学习分词方法
Active learning in Chinese word segmentation based on stratified sampling strategy
摘要
Abstract
To solve the problems of lacking of training samples and accessing a large number of labeled samples laborious,this paper proposed one fresh active learning segmentation method based on stratified sampling strategy.The method used the strat-ified sampling strategy to select the most useful instances to annotate from unlabeled samples.Next,it put the annotated in-stances into the labeled set and then trained the segmenter using the set.Finally the method tested in PKU,MSR and Shanxi university corpora and compared with the uncertainty sampling strategy .The experimental result shows that the stratified selec-tion strategy can improve the accuracy of segmentation in the same size training corpus,at the same time reduce the cost of manual annotation effectively.关键词
中文分词/主动学习/不确定性取样/分层取样策略Key words
Chinese word segmentation/active learning/uncertainty sampling/stratified sampling strategy分类
信息技术与安全科学引用本文复制引用
梁喜涛,顾磊..基于分层选择策略的主动学习分词方法[J].计算机应用研究,2015,(5):1353-1356,4.基金项目
国家自然科学基金资助项目(61302157);国家教育部人文社会科学研究青年基金资助项目(12YJC870008);江苏省教育厅高校哲学社会科学基金资助项目(2013SJB870004);江苏省社科研究文化精品课题 ()