首页|期刊导航|电子科技大学学报|基于弱监督学习的中文百科数据属性抽取

基于弱监督学习的中文百科数据属性抽取

贾真杨燕何大可

电子科技大学学报Issue(5)：758-763,6.

电子科技大学学报Issue(5)：758-763,6.DOI:10.3969/j.issn.1001-0548.2014.05.022

基于弱监督学习的中文百科数据属性抽取

Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning

贾真 ¹杨燕 ¹何大可¹

作者信息

1. 西南交通大学信息科学与技术学院成都 610031
折叠

摘要

Abstract

An attribute extraction method based on weakly supervised learning is proposed in the paper. The training corpus is automatically acquired from natural language texts by using structured attribute information from knowledgebase. To solve the problem that noise exists in the training corpus, an optimization method based on keywords filtering is proposed.N-pattern features extraction method is proposed which can relieve to some extent the data sparsity problem of traditionaln-gram features. Experiment data are downloaded from Hudong Baike. Structured attribute information is extracted from infoboxes of Hudong Baike and used to construct knowledgebase. Training data and testing data are acquired from encyclopedia entry texts. Experiment results show that the method of keywords filtering can effectively improve the quality of training corpus, and achieve better performance of attribute extraction by usingn-pattern features, compared with traditionaln-gram features.

关键词

属性抽取/特征提取/关系抽取/弱监督学习

Key words

attribute extraction/feature extraction/relation extraction/weakly supervised learning

分类

信息技术与安全科学

引用本文复制引用

贾真,杨燕,何大可..基于弱监督学习的中文百科数据属性抽取[J].电子科技大学学报,2014,(5):758-763,6.

基金项目

国家自然科学基金(61170111,61202043,61262058) （61170111,61202043,61262058）

电子科技大学学报

OA北大核心CSCDCSTPCD

ISSN：1001-0548

访问量2

下载量0

段落导航