郑州大学学报(理学版)2011,Vol.43Issue(2):20-23,4.
嵌套数据记录列表页的Web信息抽取
Web Information Extraction Based on List Pages of Nested Data
摘要
Abstract
On the basis of the existing algorithms of the nested data, the data mining algorithm was joined. According to the tag trees of constructed nested list pages, all data regions were found and unified handled. Then a global pattern was produced after all the subtrees were matched based on partial tree aligning algorithm. And all the data records were extracted. Compared with the original algorithm, the efficiency was improved by using the new method, and it ensured the accuracy.关键词
嵌套数据/列表页/标签树/数据区域/全局模式分类
信息技术与安全科学引用本文复制引用
李贵,张琪,郑新录,韩子扬,李征宇..嵌套数据记录列表页的Web信息抽取[J].郑州大学学报(理学版),2011,43(2):20-23,4.基金项目
辽宁省自然科学基金资助项目,编号20071004. ()