西安交通大学学报(医学版)2026,Vol.47Issue(2):276-282,7.DOI:10.7652/jdyxb202602011
基于LASSO的非酒精性脂肪肝病的预测研究
Prediction of non-alcoholic fatty liver disease based on LASSO regression algorithm
何娟 1高姝雅 2颜世贵 1李桂荣 3孙宏波4
作者信息
- 1. 烟台大学计算机与控制工程学院,山东 烟台 264005
- 2. 西安交通大学生物证据研究院/国家生物安全证据基地,陕西 西安 710049||西安国家卫生健康委法医学重点实验室,陕西 西安 710049
- 3. 西北大学第一医院,陕西 西安 710043||陕西同创赛尔健康管理有限公司,陕西渭南 714000
- 4. 烟台大学计算机与控制工程学院,山东 烟台 264005||西安交通大学生物证据研究院/国家生物安全证据基地,陕西 西安 710049
- 折叠
摘要
Abstract
Objective To develop a LASSO regression-based prediction model for non-alcoholic fatty liver disease(NAFLD)and identify potential disease-related genes to effectively distinguish patients from healthy individuals.Methods The GSE135251 dataset was obtained from the database(including 206 samples from NAFLD patients at different stages of fibrosis and 10 samples from healthy individuals),with all samples being transcriptome data obtained through high-throughput RNA sequencing.The Apriori algorithm was applied to analyze associations among gene loci and eliminate redundant or highly correlated features,thereby reducing data dimensionality.The LASSO regression algorithm was employed to build a NAFLD prediction model and identify potential disease-associated genes.Finally,based on the selected potential disease genes,the samples were grouped using the k-means unsupervised clustering method,and the consistency between the clustering results and the grouping of healthy individuals and NAFLD patients in the dataset was analyzed.Results The NAFLD prediction model identified PCBP2,CEBPD,GC,DNAJC12,and PTN as potential disease-associated genes.Based on these five genes,the samples were effectively clustered into healthy controls and NAFLD patients,yielding a silhouette coefficient of approximately 0.63 and a Davies-Bouldin index of about 0.5,indicating good cohesion and separation of the clusters.Conclusion The integration of association-based dimensionality reduction and machine learning enables effective identification of NAFLD-related genes and accurate sample classification,thus providing valuable insights for elucidating disease mechanisms and supporting potential clinical applications.关键词
非酒精性脂肪肝(NAFLD)/LASSO回归算法/疾病预测模型/疾病潜在基因/转录谱数据Key words
non-alcoholic fatty liver disease(NAFLD)/LASSO regression algorithm/disease prediction modeling/disease potential gene/transcriptional profiling data分类
医药卫生引用本文复制引用
何娟,高姝雅,颜世贵,李桂荣,孙宏波..基于LASSO的非酒精性脂肪肝病的预测研究[J].西安交通大学学报(医学版),2026,47(2):276-282,7.