计算机工程与科学2024,Vol.46Issue(4):635-646,12.DOI:10.3969/j.issn.1007-130X.2024.04.008
基于异构图神经网络的半监督网站主题分类
Semi-supervised website topic classification based on hetero-geneous graph neural network
摘要
Abstract
The rapid growth of the number of Internet websites has made existing methods challeng-ing to accurately classify specific website topics.URL-based methods,for example,struggle to handle topic information not reflected in the URL,while content-based methods face limitations due to data sparsity and challenges in capturing semantic relationships.To address this,a semi-supervised website topic classification method,HGNN-SWT,based on a heterogeneous graph neural network,is proposed.This method not only utilizes website text features to complement the limitations of using only URL fea-tures but also models sparse relationships between website text and words using a heterogeneous graph,improving classification performance by handling node and edge relationships within the graph.The ap-proach introduces a neighbor node sampling method based on random walks,considering both local fea-tures and the global graph structure of nodes.Additionally,a feature fusion strategy is proposed to cap-ture contextual relationships and feature interactions within website text data.Experimental results on a self-created Chinaz Website dataset demonstrate that HGNN-SWT achieves higher accuracy in website topic classification compared to existing methods.关键词
网站主题/异构图神经网络/半监督/特征融合Key words
website topic/heterogeneous graph neural network/semi-supervised/feature fusion分类
信息技术与安全科学引用本文复制引用
王谢中,陈旭,景永俊,王叔洋..基于异构图神经网络的半监督网站主题分类[J].计算机工程与科学,2024,46(4):635-646,12.基金项目
宁夏回族自治区重点研发项目(2023BDE02017) (2023BDE02017)
北方民族大学中央高校基本科研业务费专项资金(2022PT_S04) (2022PT_S04)