电子科技大学学报Issue(5):758-763,6.DOI:10.3969/j.issn.1001-0548.2014.05.022
基于弱监督学习的中文百科数据属性抽取
Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning
摘要
Abstract
An attribute extraction method based on weakly supervised learning is proposed in the paper. The training corpus is automatically acquired from natural language texts by using structured attribute information from knowledgebase. To solve the problem that noise exists in the training corpus, an optimization method based on keywords filtering is proposed.N-pattern features extraction method is proposed which can relieve to some extent the data sparsity problem of traditionaln-gram features. Experiment data are downloaded from Hudong Baike. Structured attribute information is extracted from infoboxes of Hudong Baike and used to construct knowledgebase. Training data and testing data are acquired from encyclopedia entry texts. Experiment results show that the method of keywords filtering can effectively improve the quality of training corpus, and achieve better performance of attribute extraction by usingn-pattern features, compared with traditionaln-gram features.关键词
属性抽取/特征提取/关系抽取/弱监督学习Key words
attribute extraction/feature extraction/relation extraction/weakly supervised learning分类
信息技术与安全科学引用本文复制引用
贾真,杨燕,何大可..基于弱监督学习的中文百科数据属性抽取[J].电子科技大学学报,2014,(5):758-763,6.基金项目
国家自然科学基金(61170111,61202043,61262058) (61170111,61202043,61262058)