烟草科技2026,Vol.59Issue(1):1-9,9.DOI:10.16135/j.issn1002-0861.2025.0408
跨配方、跨规格卷烟烟气成分数据同质性分析与随机森林建模
Homogeneity analysis and random forest modeling of cigarette smoke data cross blend formulas and specifications
摘要
Abstract
In order to investigate the relationships between materials for cigarette design and components in cigarette mainstream smoke and to achieve rapid prediction of tar release,this study proposed a data homogeneity evaluation method for material parameters and smoke chemistry data across different cigarette formulas and specifications to overcome the shortcomings of traditional methods,which necessitate separate material parameter combination experiments and modeling for different cigarette formulas and specifications,resulting in a large workload,long analysis time,narrow model applicability and low data utilization efficiency.The new method employed the distribution analysis,check analysis and cluster analysis to comprehensively evaluate data characteristics and select appropriate datasets for integration.A nonlinear machine learning algorithm,random forest(RF),was used to establish prediction models for routine cigarette smoke components and puffing counts,with cigarette blend formula and material design parameters as independent variables.The results showed that:1)The established models for different cigarette formulas and specifications based on integrated datasets exhibited enhanced predictive performance,achieving a mean absolute percentage error of 2.7%of five-fold cross-validation test set.2)For a given set of cigarette samples with new formulas or specifications,only three or more sets of actual measurement data were required to adjust and optimize the model using this method,enabling rapid prediction of smoke component releases under different formulas and cigarette specifications with a mean absolute percentage error around 10%.关键词
卷烟/烟气成分/数据同质性/K-S检验/层次聚类/机器学习/随机森林Key words
Cigarette/Smoke component/Data homogeneity/K-S test/Hierarchical clustering/Machine learning/Random forest分类
轻工纺织引用本文复制引用
宛然,王洪波,崔华鹏,闫新可,陈海兵,罗建钦,阚宏伟,聂聪,谢复炜,孙学辉,韩云龙,黄劭理,王聪,杨松,赵乐,叶远青,郭军伟..跨配方、跨规格卷烟烟气成分数据同质性分析与随机森林建模[J].烟草科技,2026,59(1):1-9,9.基金项目
中国烟草总公司科技项目"基于原辅材料数字孪生的卷烟焦油和有害成分设计预测技术研究"(110202402003)、"超低焦中式烤烟型卷烟设计技术体系构建与应用"(110202402004) (110202402003)
江苏中烟工业有限责任公司项目"低焦油卷烟材料数字化设计技术研究"(H202308) (H202308)
广西中烟工业有限责任公司项目"云南烟叶模块配方数字化模型构建"(2022450000340079). (2022450000340079)