| 注册
首页|期刊导航|色谱|新污染物诱导12种细胞核受体相关活性的机器学习预测模型

新污染物诱导12种细胞核受体相关活性的机器学习预测模型

李建青 王天勤 滕跃发 郭磊 黄杨 李斐

色谱2025,Vol.43Issue(8):959-970,12.
色谱2025,Vol.43Issue(8):959-970,12.DOI:10.3724/SP.J.1123.2024.12008

新污染物诱导12种细胞核受体相关活性的机器学习预测模型

Machine learning prediction model for emerging pollutants-induced activities of 12 nuclear receptors

李建青 1王天勤 1滕跃发 2郭磊 1黄杨 1李斐2

作者信息

  • 1. 鲁东大学化学与材料科学学院,山东 烟台 264025
  • 2. 中国科学院海岸带环境过程与生态修复重点实验室(烟台海岸带研究所),山东省海岸带环境过程重点实验室,中国科学院烟台海岸带研究所,山东 烟台 264003
  • 折叠

摘要

Abstract

Emerging pollutants are substances that have recently been discovered or brought into focus,pose ecological or human-health risks,and have not yet been included in regulatory frameworks or for which existing management measures inadequately prevent and control their risks.Synthetic chemicals play key roles in progressing human society and improving quality of life.However,these chemicals may leak into the environment through unintentional or organized emissions during the life cycles of chemical-containing products,thereby becoming potential emerging pollutants and posing ecological and human-health threats.Many new chemicals are typically used without sufficient toxicity assessments;consequently,their potential threats are difficult to predict.Hence,effective toxicity assessments of existing and emerging chemicals are required to address this situation.Toxicity testing all chemicals is expected to be very time-consuming and economically expensive.In addition,there are discrepancies between experimental results from different laboratories leading to inconsistent toxicity-screening standards for emerging pollutants,which hinders preventing and controlling emerging pollutants and explaining their toxicity mechanisms.Addressing these issues requires the development of standard alternative toxicity-testing strategies that screen emerging pollutants in a high-throughput manner.In this study,machine-learning methods were used to predict the toxicities of various compounds in the Tox21 database.The RDKit and Mordred libraries were used to process structural data(presented in SMILES format)for compounds with the aim of generating molecular descriptors for their physicochemical properties.A set of refined features was screened through information-gain calculations and variable selection,and the data were fitted using Python's Sklearn and XGBoost libraries.Prediction models were constructed based on the screened features using seven machine-learning algorithms in order to evaluate 12 different bioactive endpoints,including datasets related to endocrine disruption,DNA damage,and oxidative stress response,among others.Model performance was evaluated by cal-culating the accuracy of the test set,and data availability was characterized in terms of the ap-plication domain.All training and test data were found to be located in the application domain.The model was found to highly accurately predict 12 endpoints.This study clarified the relationship between the physicochemical properties of chemicals and nuclear receptor activity,and developed corresponding software tools.The model for the 12 Tox21 datasets exhibited an average area under the curve(AUC)of 0.84,and delivered better prediction performance than other participating models.Further insight into toxicological mechanisms was obtained through feature-importance analysis using Shapley Additive exPlanations(SHAPs).The octanol-water partition coefficient(log P),molecular topology,and ZMIC and piPC descriptors were identified as key parameters for predicting toxicity;these descriptors elucidate the relationship between chemical structure and biological interaction,thereby providing mechanistic explanations for compound toxicities.For example,high log P values are associated with high cell membrane permeability,which facilitates interactions between intracellular targets and endocrine receptors.The study also developed user-friendly quantitative structure-activity relationships(QSAR)prediction software.Designed for ac-cessibility,this software enables researchers and policymakers to input compound structures in SMILES format and predict their toxicities without the need for specialized machine-learning ex-pertise.The software automatically generates descriptors and predicts whether the input compounds are toxic or not.This study contributes to in silico methods that replace animal testing in future toxicity studies by integrating advanced machine-learning and interpretation methods.The predictive model and accompanying software enable the rapid screening of emerging pollutants and provide guidance for designing safer chemicals.These contributions are critical for advancing environmental safety and public health in the face of expanding chemical inventories.

关键词

新污染物/定量构效关系/机器学习/生物效应

Key words

emerging pollutants/quantitative structure-activity relationship(QSAR)/machine learning/biological effects

分类

化学化工

引用本文复制引用

李建青,王天勤,滕跃发,郭磊,黄杨,李斐..新污染物诱导12种细胞核受体相关活性的机器学习预测模型[J].色谱,2025,43(8):959-970,12.

基金项目

国家自然科学基金(22406080,22376215) (22406080,22376215)

泰山学者工程(tsqn202312275) (tsqn202312275)

山东省自然科学基金(ZR2024QB094) (ZR2024QB094)

山东省科技型中小企业创新能力提升项目(2024TSGC0504) (2024TSGC0504)

山东省大学生创新创业训练计划项目(S202410451034).National Natural Science Foundation of China(Nos.22406080,22376215) (S202410451034)

Taishan Scholars Program(No.tsqn202312275) (No.tsqn202312275)

Natural Science Foundation of Shandong Province(No.ZR2024QB094) (No.ZR2024QB094)

Shan-dong Province Science and Technology Small and Medium-sized Enterprise Innovation Capability En-hancement Project(No.2024TSGC0504) (No.2024TSGC0504)

Shandong Provincial College Student Innovation and Entrepre-neurship Training Program(No.S202410451034). (No.S202410451034)

色谱

OA北大核心

1000-8713

访问量0
|
下载量0
段落导航相关论文