首页|期刊导航|郑州大学学报（理学版）|嵌套数据记录列表页的Web信息抽取

嵌套数据记录列表页的Web信息抽取

李贵张琪郑新录韩子扬李征宇

郑州大学学报（理学版）2011，Vol.43Issue(2)：20-23,4.

嵌套数据记录列表页的Web信息抽取

Web Information Extraction Based on List Pages of Nested Data

李贵 ¹张琪 ¹郑新录 ¹韩子扬 ¹李征宇¹

作者信息

1. 沈阳建筑大学,计算机应用技术系,辽宁,沈阳,110168
折叠

摘要

Abstract

On the basis of the existing algorithms of the nested data, the data mining algorithm was joined. According to the tag trees of constructed nested list pages, all data regions were found and unified handled. Then a global pattern was produced after all the subtrees were matched based on partial tree aligning algorithm. And all the data records were extracted. Compared with the original algorithm, the efficiency was improved by using the new method, and it ensured the accuracy.

关键词

嵌套数据/列表页/标签树/数据区域/全局模式

分类

信息技术与安全科学

引用本文复制引用

李贵,张琪,郑新录,韩子扬,李征宇..嵌套数据记录列表页的Web信息抽取[J].郑州大学学报（理学版）,2011,43(2):20-23,4.

基金项目

辽宁省自然科学基金资助项目,编号20071004. （）

郑州大学学报（理学版）

OA北大核心CSTPCD

ISSN：1671-6841

访问量0

下载量0

段落导航