数字技术与应用Issue(3):171-173,3.
基于改进DSE算法的web信息抽取
Information Extraction from Web Pages Based on Improved DSE Algorithm
摘要
Abstract
Along with the rapid development of Internet technology and,more and more people begin to realize the importance of internet as a huge information source.The most important problem to solve in web information extraction is extracting and organizing the information from the internet automatically and effectively.Based on the DSE algorithm and the RoadRunner system to explore and improve the algorithm,we propose a new automated information extraction methods to generate the template and the template page with the url in determining the threshold into a bioinformatics approach in the FDR for the determination of the threshold proposed theoretical basis.Experimental results show that the improved extraction method for the extraction of the accuracy of the results of significant improvement.关键词
信息抽取/模板/DSE/RoadRunner/文档对象模型Key words
information extraction/template/DSE/RoadRunner/document object model分类
信息技术与安全科学引用本文复制引用
张冬梅,陈钊,陈剑..基于改进DSE算法的web信息抽取[J].数字技术与应用,2012,(3):171-173,3.基金项目
中央高校基本科研业务费专项资金资助 ()