河南农业大学学报2025,Vol.59Issue(3):516-527,12.DOI:10.16445/j.cnki.1000-2340.20241009.001
数据集划分及预处理方法对烟叶化学成分近红外定量模型的影响
Influence of dataset partitioning and spectral pre-processing methods on the near infrared quantitative model of chemical ingredients in tobacco leaves
摘要
Abstract
[Objective]The aim of this study is to clarify the appropriate dataset division method,pro-portion and data preprocessing method for model construction,so as to lay a foundation for establishing an accurate and stable analysis model for chemical ingredients in tobacco leaves.[Method]A total of 210 tobacco leaves were used as research samples for the determination of the content of total sugar,reducing sugar,nitrogen,nicotine,potassium and chlorine.Meanwhile,the spectral data of these samples was collected.Influence of different partitioning methods,such as random stone(RS),uniformly-level stone(LS),sample set partitioning based on joint x-y distances(SPXY)and Kennard Stone(KS),as well as the pretreatment and combination of spectral data on the prediction accuracy of Partial Least Squares(PLS)quantitative model of conventional chemical components in tobacco leaves were studied.[Result]The results showed that the corrected set and prediction set was more evenly dis-tributed after the data set was divided by SPXY.When the proportion of prediction set was 24%,the con-structed model had stronger prediction ability.The optimal preprocessing combination for the quantita-tive model of total sugar and chloride was Multiplicative Scatter Correction(MSC)+Moving Average Smoothing(MA)+Wavelet Transform(WAVE).The value of rp of the quantitative model was 0.984 0 and 0.986 0,respectively.The optimal preprocessing combination for the quantitative model of reduced sugar and nicotine was max-min scaling(MAXMIN)+MSC+WAVE,and the value of rp was 0.990 0 and 0.985 2,respectively.The optimal preprocessing combination for potassium was MSC+WAVE(rp=0.969 4).However,the model based on raw spectral data had the strongest prediction ability for nitro-gen(rp=0.970 9).[Conclusion]The accuracy of the near infrared quantitative model for conventional chemical components in tobacco leaves based on NIR was significantly improved after data set division and pretreatment optimization.The results in this study provide a reference for the construction of near infrared quantitative models for other chemical ingredients in tobacco leaves.关键词
烟叶/近红外光谱/数据集划分/数据预处理/定量模型Key words
tobacco/near infrared spectroscopy/dataset partitioning/data pre-processing/quantita-tive model分类
轻工业引用本文复制引用
付博,杨永锋,刘向真,牛洋洋,刘茂林,赵森森,于建军,彭桂新,姬小明..数据集划分及预处理方法对烟叶化学成分近红外定量模型的影响[J].河南农业大学学报,2025,59(3):516-527,12.基金项目
河南省科技攻关项目(232102110168) (232102110168)
河南中烟工业有限责任公司科技项目(C202023) (C202023)