沈阳工业大学学报2025,Vol.47Issue(5):594-601,8.DOI:10.7688/j.issn.1000-1646.2025.05.06
面向多标签隐性知识的文本数据挖掘算法
Text data mining algorithm for multi-label implicit knowledge
摘要
Abstract
[Objective]With the expanding user group of social software,multi-label annotation has been increasingly adopted for text information.How to analyze the behavior and psychology of the user group through data mining of multi-label text information has become a research hotspot.A data mining algorithm for multi-label implicit knowledge based on a deep topic feature extraction model was utilized to enhance text classification accuracy and data mining efficiency.[Methods]To deeply understand the implicit knowledge in text information,the socialization,externalization,combination,and internalization(SECI)theory was employed to convert the implicit knowledge into explicit knowledge.The short-term memory capability of recurrent neural networks was utilized to improve the conversion efficiency.Considering the complexity of text information,local and global features were analyzed separately,and feature fusion was used to improve data mining efficiency.Due to the strong correlation between the context of text information,the gate mechanism of the long short-term memory(LSTM)model was applied to extract contextual dependencies,while the unsupervised latent Dirichlet allocation(LDA)topic model was selected to model the topic structure of the text to mitigate standard differences from manual labeling.Combining LDA-derived global features and LSTM-derived local features,feature stitching was performed to reduce information loss during the feature extraction.A theme controller was introduced to narrow down the inference scope,which obtained more effective text features.Simultaneously,a Gaussian decoder-based contextual topic layer was constructed to calculate the conditional probability matrix of each vocabulary under a given topic,and a Gaussian mixture decoder was used to obtain the conditional probability of the vocabulary.Topic modeling optimization and content expansion were achieved through a Gaussian mixture decoder.Finally,multi-label classification was implemented using the Softmax function to calculate label probabilities.[Results]During model training,perplexity was used as a criterion for evaluation.The proposed model exhibited better perplexity than the control groups(LDA topic model and LSTM model),demonstrating the effectiveness of feature concatenation combining the LDA topic model and LSTM model.By comparing with NVDM,LSTM,LDA,and VAETM models,with precision and recall as evaluation metrics,the proposed model improves precision and recall by 5.05%and 2.75%,respectively.[Conclusion]The comparative experimental results show that the proposed model can significantly improve the performance of text classification.Compared with the LDA topic model and the LSTM model,it outperforms in processing multi-label texts.It can efficiently mine the implicit knowledge in multi-label text data,providing an efficient and accurate solution for tasks such as text classification,semantic analysis,and information retrieval.关键词
多标签文本/深度主题特征提取模型/隐性知识/循环神经网络/LSTM神经网络/LDA主题模型/特征拼接/高斯解码器Key words
multi-label text/deep topic feature extraction model/implicit knowledge/recurrent neural network/long short-term memory(LSTM)neural network/latent Dirichlet allocation(LDA)topic model/feature stitching/Gaussian decoder分类
信息技术与安全科学引用本文复制引用
邓乔夫,李骁娅,郭校君..面向多标签隐性知识的文本数据挖掘算法[J].沈阳工业大学学报,2025,47(5):594-601,8.基金项目
四川省中小企业发展专项技术创新项目(24KCZJ0092). (24KCZJ0092)