计算机工程与科学2012,Vol.34Issue(4):162-166,5.DOI:10.3969/j.issn.1007-130X.2012.04.031
基于非平衡数据分类的单文档自动文摘方法
Imbalanced Classification Approaches to Automatic Single-Document Summarization
摘要
Abstract
Machine learning based automatic document summarization approaches have drawn increasing attentions in the natural language processing literature. However, neither of them takes the im-balanced class distribution in automatic document summarization into account, I. E. , the number of the sentences in summary is much fewer than that of in the whole document. It is obvious that the highly imbalanced data distribution will degrade the effectiveness of the conventional machine learning algorithms. This paper addresses the problem of automatic document summarization from a perspective of imbalanced classification and proposes two learning strategies to deal with the highly imbalanced distributed data in automatic single-document summarization effectively. The experimental results on the DUC 2001 data set show the significant performance improvements of our approaches in terms of F1 and ROUGH-2.关键词
非平衡数据分类/自动摘要/支持向量机/分类间隔/分类器集成Key words
imbalanced classification/automatic document summarization /SVM/ margin/ bagging分类
信息技术与安全科学引用本文复制引用
倪维健,刘彤,曾庆田,赵华,汤建渝..基于非平衡数据分类的单文档自动文摘方法[J].计算机工程与科学,2012,34(4):162-166,5.基金项目
国家自然科学基金资助项目(61170079) (61170079)
山东省统计科研重点课题资助项目(KT11017) (KT11017)
山东科技大学春蕾计划资助项目(2010AZZ179) (2010AZZ179)
山东省优秀中青年科学家奖励基金资助项目(BS2009DX004) (BS2009DX004)
青岛市公共领域科技支撑计划资助项目(10-3-3-32-nsh) (10-3-3-32-nsh)
中国博士后基金资助项目(2011M501155) (2011M501155)
山东科技大学杰出青年基金资助项目(2010KYJQ101) (2010KYJQ101)