计算机技术与发展2016,Vol.26Issue(9):183-186,191,5.DOI:10.3969/j.issn.1673-629X.2016.09.041
一种基于知识工程的DeepWeb信息抽取方法
A DeepWeb Information Extraction Method Based on Knowledge Engineering
摘要
Abstract
The information contained in DeepWeb is more and more huge with great value. But due to the factors that DeepWeb informa-tion is highly heterogeneous, autonomous, dynamic and incomplete, and the design style, page structure, display contents of DeepWeb theme website are different,and the use of JavaScript technology is widespread,the traditional extraction technology can’ t be effectively automated integration of high quality information contained in DeepWeb resources. Presents a DeepWeb extraction method based on knowledge engineering. The page mode,HTML structure and visual features of DeepWeb are analyzed and integrated. Applies HTML DOM tree parsing algorithm to match the template accorded with page mode,HTML structure and object information source by automatic or semi-automatic way,locating the information in DeepWeb to obtain the free text,structured and semi-structured data. Using a large number of site data with nested structure as data source,the effectiveness of extraction method is verified.关键词
DeepWeb/JavaScript技术/嵌套结构/DOM树/抽取模型Key words
DeepWeb/JavaScript technology/nested structure/DOM tree/extraction model分类
信息技术与安全科学引用本文复制引用
乌尔柯西,杨抒,王业,游香薷..一种基于知识工程的DeepWeb信息抽取方法[J].计算机技术与发展,2016,26(9):183-186,191,5.基金项目
新疆维吾尔自治区自然科学基金(2014211B023) (2014211B023)