南京师大学报(自然科学版)2025,Vol.48Issue(3):84-92,9.DOI:10.3969/j.issn.1001-4616.2025.03.010
基于NLP和图像分类模型的中文科技文献双模态分类方法
A Bimodal Classification Method for Chinese Scientific and Technological Literature Based on NLP and Image Classification Models
摘要
Abstract
Currently,the demand for more scalable,accurate,and automated document classification is increasing due to the sharp increase in the management and organization of technical literature.To solve the problem of effective data analysis from massive scientific literature data,a multi-modal literature analysis engine is proposed,which combines the YOLOv7 image classification model and natural language processing model.This architecture utilizes three types of information,including natural language text in the document,descriptive images,and the relationship between them.By integrating and training deep learning networks of different modals,the multi-modal approach achieves better classification accuracy than the unimodal method.The proposed method is applied to a Chinese scientific literature dataset,and the model is trained to classify documents based on the Chinese Library Classification system.The results show that the proposed method has higher classification accuracy than unimodal methods,which helps promote data and knowledge management for enterprises,institutions,and research organizations.关键词
科技文献分类/图像分类/多模态特征/自然语言处理/深度学习/YOLOv7Key words
classification of scientific and technological literature/document image classification/multi-modal features/natural langliteratureuage processing/deep learning/YOLOv7分类
计算机与自动化引用本文复制引用
王峥,丁熠,陈海明,陈盈..基于NLP和图像分类模型的中文科技文献双模态分类方法[J].南京师大学报(自然科学版),2025,48(3):84-92,9.基金项目
国家自然科学基金面上资助项目(61976149)、浙江省自然科学基金重点资助项目(Z20F020008)、浙江省普通本科高校"十四五"教学改革资助项目(jg20220563)、2025年度浙江省自然科学基金资助项目(LMS25A010011)、浙江省科技厅软科学研究计划资助项目(2025C35030). (61976149)