分析化学2024,Vol.52Issue(9):1266-1276,11.DOI:10.19756/j.issn.0253-3820.241155
基于数据增强策略和卷积神经网络的近红外光谱分析研究
Near Infrared Spectral Analysis Based on Data Augmentation Strategy and Convolutional Neural Network
摘要
Abstract
Near infrared spectroscopy(NIRS)technology combined with chemometrics algorithms has been widely used in quantitative and qualitative analysis of food and medicine.However,traditional chemometrics methods,especially linear classification methods,often yield unsatisfactory results when addressing multi-class classification problems.Convolutional neural network(CNN)is adept at extracting deep-level features from data and suitable for handling non-linear relationships.The modeling performance of CNN depends on the size and diversity of sample,while the collection and preprocessing of NIRS sample data is often time-consuming and labor-intensive.This study proposed a NIRS qualitative analysis method based on data augmentation strategies and CNN.The data augmentation strategy included two steps.Firstly,applying Bootstrap resampling and generative adversarial network(GAN)methods to augment three NIRS datasets(Medicine,coffee and grape).Secondly,combining the original samples(Y)with the Bootstrap augmented samples(B)and GAN augmented samples(G)to obtain three augmented datasets(Y-B,Y-G and Y-B-G).Based on this,a CNN model structure suitable for these datasets was designed,consisting of 2 one-dimensional convolutional layers,1 max-pooling layer,and 1 fully connected layer.The results showed that compared to the optimal models of partial least squares discriminant analysis(PLS-DA),support vector machine(SVM),and back propagation neural network(BP),the CNN model based on Y-B dataset achieved average accuracy improvements of 3.998%,9.364%,and 4.689%for medicine(Binary classification);the CNN model based on the Y-B-G dataset achieved average accuracy improvements of 6.001%,2.004%,and 7.523%for coffee(7-class classification);and the CNN model based on the Y-B dataset achieved average accuracy improvements of 33.408%,51.994%,and 34.378%for grapes(20-class classification).It was evident that the models established based on data augmentation strategies and CNN demonstrated better classification accuracy and generalization performance with different datasets and classification categories.关键词
数据增强/近红外光谱/卷积神经网络/化学计量学Key words
Data augmentation/Near infrared spectroscopy/Convolutional neural network/Chemometrics引用本文复制引用
郑运,杨思雨,王涛,邓焯文,兰维杰,云永欢,潘磊庆..基于数据增强策略和卷积神经网络的近红外光谱分析研究[J].分析化学,2024,52(9):1266-1276,11.基金项目
海南省重点研发项目(No.ZDYF2024XDNY197)、海南省自然科学基金项目(Nos.323QN202,322CXTD523)、国家自然科学基金项目(No.22164008)和海南省院士团队创新中心平台资助. Supported by the Key Research and Development Project of Hainan Province(No.ZDYF2024XDNY197),the Natural Science Foundation of Hainan Province(Nos.323QN202,322CXTD523),the National Natural Science Foundation of China(No.22164008)and the Innovation Center Platform for Academicians of Hainan Province. (No.ZDYF2024XDNY197)