| 注册
首页|期刊导航|计算机工程与科学|基于非平衡数据分类的单文档自动文摘方法

基于非平衡数据分类的单文档自动文摘方法

倪维健 刘彤 曾庆田 赵华 汤建渝

计算机工程与科学2012,Vol.34Issue(4):162-166,5.
计算机工程与科学2012,Vol.34Issue(4):162-166,5.DOI:10.3969/j.issn.1007-130X.2012.04.031

基于非平衡数据分类的单文档自动文摘方法

Imbalanced Classification Approaches to Automatic Single-Document Summarization

倪维健 1刘彤 1曾庆田 1赵华 1汤建渝1

作者信息

  • 1. 山东科技大学信息科学与工程学院,山东青岛266510
  • 折叠

摘要

Abstract

Machine learning based automatic document summarization approaches have drawn increasing attentions in the natural language processing literature. However, neither of them takes the im-balanced class distribution in automatic document summarization into account, I. E. , the number of the sentences in summary is much fewer than that of in the whole document. It is obvious that the highly imbalanced data distribution will degrade the effectiveness of the conventional machine learning algorithms. This paper addresses the problem of automatic document summarization from a perspective of imbalanced classification and proposes two learning strategies to deal with the highly imbalanced distributed data in automatic single-document summarization effectively. The experimental results on the DUC 2001 data set show the significant performance improvements of our approaches in terms of F1 and ROUGH-2.

关键词

非平衡数据分类/自动摘要/支持向量机/分类间隔/分类器集成

Key words

imbalanced classification/automatic document summarization /SVM/ margin/ bagging

分类

信息技术与安全科学

引用本文复制引用

倪维健,刘彤,曾庆田,赵华,汤建渝..基于非平衡数据分类的单文档自动文摘方法[J].计算机工程与科学,2012,34(4):162-166,5.

基金项目

国家自然科学基金资助项目(61170079) (61170079)

山东省统计科研重点课题资助项目(KT11017) (KT11017)

山东科技大学春蕾计划资助项目(2010AZZ179) (2010AZZ179)

山东省优秀中青年科学家奖励基金资助项目(BS2009DX004) (BS2009DX004)

青岛市公共领域科技支撑计划资助项目(10-3-3-32-nsh) (10-3-3-32-nsh)

中国博士后基金资助项目(2011M501155) (2011M501155)

山东科技大学杰出青年基金资助项目(2010KYJQ101) (2010KYJQ101)

计算机工程与科学

OA北大核心CSCDCSTPCD

1007-130X

访问量1
|
下载量0
段落导航相关论文