分析测试学报2025,Vol.44Issue(6):1139-1146,8.DOI:10.12452/j.fxcsxb.241028489
基于挥发性有机物检验的血液种属鉴别方法与模型评估
Blood Species Identification Method and Model Evaluation Based on Volatile Organic Compounds Testing
摘要
Abstract
Blood species identification(BSI)plays a significant role in criminal investigation,im-port and export inspection,animal protection and other fields.Volatilomics analysis of blood volatile organic compounds(VOCs)is a novel approach for blood species identification.To screen potential biomarkers of blood from different species,the study established multiple machine learning(ML)classification algorithms,and compared the predictive value of different classification models for blood species identification.Headspace solid-phase microextraction(HS-SPME)coupled with gas chromatography-mass spectrometry(GC-MS)was used to analyze VOCs in the blood of eight common species.Partial least squares discriminant analysis(PLS-DA)and orthogonal partial least squares discriminant analysis(OPLS-DA)were employed to screen potential biomarkers.Samples were ran-domly divided into training and testing sets at a ratio of 7∶3.Nine common classification models were established,and the best algorithm was selected and optimized by comparing all models.A model using all VOCs as variables was constructed to verify the reliability of the potential biomarkers,and different resampling methods were used to assess the impact of training and testing set division on the model.A total of 17 VOCs related to species characteristics of human and seven different animal bloods were screened.The accuracy of the multi-layer perceptron,naive Bayes algorithm,multino-mial logistic regression algorithm,K-nearest neighbor(KNN)algorithm,Gaussian kernel function support vector machine,polynomial kernel function support vector machine,decision tree,random forest model,and extreme gradient boosting tree model were 0.859 7,0.575 1,0.859 7,0.942 1,0.815 0,0.734 2,0.842 9,0.923 1 and 0.872 9,respectively.Among them,the accuracy,area under the receiver operating characteristic curve(AUC),and Brier score of the KNN model under the testing set were 0.918 4,0.999 0 and 0.037 6.KNN was selected as the optimal algorithm,and the best model's hyperparameter combination was:K value of 5,distance-weighted kernel function of the triweight function,and the Minkowski distance parameter p of 0.324 0.The best-performing model on the validation set achieved an accuracy of 0.928 4,with corresponding metrics of 0.997 0 for the AUC and 0.057 6 for the Brier score.There was no significant difference in the results between the model using all component variables and the potential biomarker variable mod-el(t-test p>0.05),and there was no significant difference in the results of models using different resampling methods(t-test p>0.05).Volatilomics analysis shows great potential in blood species identification,with strong reliability of potential biomarkers,high model accuracy,and strong anti-interference ability.关键词
顶空固相微萃取/气相色谱-质谱/挥发性有机物/血液种属鉴别/挥发组学/机器学习Key words
headspace solid-phase microextraction(HS-SPME)/gas chromatography-mass spec-trometry(GC-MS)/volatile organic compounds(VOCs)/blood species identification(BSI)/vola-tomics/machine learning(ML)分类
化学化工引用本文复制引用
张文骥,李昊洋,丁海媛,韩祺瑞,宋辉,罗颖超..基于挥发性有机物检验的血液种属鉴别方法与模型评估[J].分析测试学报,2025,44(6):1139-1146,8.基金项目
辽宁省自然科学基金计划(面上项目,2024-MS-208) (面上项目,2024-MS-208)
辽宁省教育厅基本科研项目(重点攻关项目,JYTZD2023147) (重点攻关项目,JYTZD2023147)