首页|期刊导航|沈阳大学学报|基于XML的Web内容挖掘方法

基于XML的Web内容挖掘方法

郑霞陈建国

沈阳大学学报2012，Vol.24Issue(3)：52-55,4.

基于XML的Web内容挖掘方法

Method of Web Content Mining based on XML

郑霞 ¹陈建国²

作者信息

1. 闽江学院计算机科学系,福建福州350001
2. 福建工程学院软件学院,福建福州350003
折叠

摘要

Abstract

The characteristics of Web content mining were analyzed and a model of Web content mining was proposed base on XML. The HITS algorithm was used to determine the authority of Web pages, the HTML Tidy tool was used for non-XML documents through the data cleansing and transform XML documents into well-formed, and text clustering techniques were used for XML document classification data in data mining. Combining with the examples of traditional scientific papers of automated extraction system from Internet, the model is proved to work well, and it can automatically and effectively extract web page content.

关键词

Web挖掘/数据挖掘/文本聚类/非XML文档

Key words

Web Mining/data mining/text clustering/non-XML documents

分类

计算机与自动化

引用本文复制引用

郑霞,陈建国..基于XML的Web内容挖掘方法[J].沈阳大学学报,2012,24(3):52-55,4.

沈阳大学学报

OACHSSCD

ISSN：2095-5456

访问量0

下载量0

段落导航