|国家科技期刊平台
首页|期刊导航|计算机应用与软件|一种融合语义特征的图卷积文本分类方法

一种融合语义特征的图卷积文本分类方法OA北大核心CSTPCD

A GRAPH CONVOLUTIONAL TEXT CLASSIFICATION METHOD WITH SEMANTIC FEATURES

中文摘要英文摘要

随着文本分类领域相关研究的推进,基于深度学习的文本分类方法成为了该领域的重要研究方向之一.深度学习模型因其强大的特征提取能力,在文本分类任务上有着颇为优越的表现.但由于文本数据的高维性和自然语言的语义复杂性,现有的深度学习模型在复合语义信息的提取上仍有待进一步优化,其表现对文本分类效果产生不可忽视的影响.为此,该文提出一种基于LDA和GCN的文本分类模型LGCN.该模型利用LDA模型学习文档、单词和主题的关联信息,借助滑动窗口、PMI值计算等方式获取字符间的联系,采用TF-IDF得到单词和文档的联系,通过融合这些丰富的语义信息得到以节点形式构建的图,使用GC N模型学习图中语义信息并对图中文档节点进行分类从而完成文本分类任务.实验结果表明,在相同的数据集上,LGCN模型的文本分类效果优于LSTM等参照模型.

With the advancement of related research in the field of text classification,text classification methods based on deep learning have become one of the important research directions in this field.Due to its powerful feature extraction capabilities,deep learning models have quite superior performance on text classification tasks.However,due to the high dimensionality of text data and the semantic complexity of natural language,the existing deep learning models still need to be further optimized in the extraction of composite semantic information,and their performance has a non-negligible impact on the text classification effect.Therefore,this paper proposes a text classification model LGCN based on LDA and GCN.The model used the LDA model to learn the associated information of documents,words and topics,and used sliding windows and PMI value calculations to obtain the relationship between characters.TF-IDF was used to obtain the connection between words and documents,and a graph constructed in the form of nodes was obtained by fusing rich semantic information.The GCN model was used to learn the semantic information in the graph and classify the document nodes in the graph to complete the text classification task.The experimental results show that on the same data set,the text classification effect of LGCN model is better than that of reference models such as LSTM.

黎文杰;洪嘉伟;魏艳辉;左亚尧

广东工业大学计算机学院 广东广州 510006

计算机与自动化

图卷积神经网络隐狄利克雷分布文本分类

Graph convolutional networkLatent Dirichlet allocationText classification

《计算机应用与软件》 2024 (005)

247-253,285 / 8

广东省自然科学基金项目(2018A030313934).

10.3969/j.issn.1000-386x.2024.05.038

评论