摘要
Abstract
Text-attribute graph has increasingly become a hotspot in the field of graph research.In traditional graph neural network(GNN)research,the node features used are typically shallow features derived from text information or manually designed features,such as those from the skip-gram and con-tinuous bag of words(CBOW)models.In recent years,with the advent of large language models(LLMs),profound changes have taken place in the direction of natural language processing(NLP).These changes have not only impacted NLP tasks but have also begun to permeate into GNNs.Conse-quently,recent graph-related work has started to introduce language representation models and large language models to generate new node characterization,aiming to further mine richer semantic informa-tion.In existing work,most models still adopt traditional GNN architectures or contrastive learning ap-proaches.In the category of contrastive learning methods,since traditional node features and node char-acterization generated by language models are not produced by a unified model,they face the challenge of dealing with two vectors located in different vector spaces.Based on these challenges and considera-tions,a model named GRASS is proposed.Specifically,in the pre-training task,the model introduces text information expanded by large language models,which is used for contrastive learning with textual information processed by graph convolution.In downstream tasks,to reduce the cost of fine-tuning,GRASS aligns the formats of downstream tasks with those of pre-training tasks.Through this model,GRASS can perform well on node classification tasks without the need for fine-tuning,especially in low-shot scenarios.For example,in the 1-shot scenario,compared with the best baseline,GRASS improves by 6.10%,6.22%,and 5.21%on the Cora,Pubmed,and ogbn-arxiv datasets,respectively.关键词
图神经网络/文本属性图/大语言模型/对比学习/预训练/提示学习Key words
graph neural network/text-attributed graph/large language model/contrastive learning/pre-training/prompt learning分类
信息技术与安全科学