| 注册
首页|期刊导航|浙江大学学报(医学版)|生成式人工智能在消瘦患者鉴别诊断中的辅助价值

生成式人工智能在消瘦患者鉴别诊断中的辅助价值

刘颖 张云红 蔡东平 任菁菁

浙江大学学报(医学版)2026,Vol.55Issue(1):65-71,7.
浙江大学学报(医学版)2026,Vol.55Issue(1):65-71,7.DOI:10.3724/zdxbyxb-2025-0463

生成式人工智能在消瘦患者鉴别诊断中的辅助价值

Evaluating the performance of generative AI in assisting the differential diagnosis of weight loss

刘颖 1张云红 2蔡东平 3任菁菁1

作者信息

  • 1. 浙江大学医学院附属第一医院全科医学科,浙江 杭州 310003
  • 2. 大理白族自治州人民医院全科医学科,云南 大理 671000
  • 3. 苏州高新区狮山街道社区卫生服务中心,江苏 苏州 215011
  • 折叠

摘要

Abstract

Objective:To systematically evaluate the performance of generative artificial intelligence(GenAI)models,DeepSeek-V3 and the Qwen3 series,in the differential diagnosis of weight loss.Methods:A search was conducted in the PubMed database for all case reports published in the American Journal of Case Reports between January 1,2012 and June 2,2025,containing the term"weight loss"in the title or abstract.Two senior general practitioners independently reviewed each case to determine whether it met predefined diagnostic criteria for weight loss(emaciation).Cases that did not meet these criteria,had incomplete information,or involved clearly defined specialty-specific diagnoses and treatments were excluded.The remaining cases were then compiled into standardized clinical case summaries.These summaries were presented to DeepSeek-V3 and the Qwen3 series models(Qwen3-235B-A22B,Qwen3-30B-A3B,and Qwen3-32B)to generate ranked lists of the top 10 differential diagnoses.The models were not specifically fine-tuned for this task.Sensitivity,precision,and F1-score were used to evaluate performance.Intergroup comparisons were performed using McNemar's test and Cochran's Q test.Results:A total of 87 case were analyzed.DeepSeek-V3 demonstrated better performance than Qwen3-235B-A22B in sensitivity,precision,and F1-score,especially at the Top5 level(P=0.043).Among the Qwen3 series models,Qwen3-235B-A22B showed the best performance in sensitivity,precision,and F1-score for the Top1 diagnosis,but the differences among the three Qwen3 models across all diagnostic levels were not statistically significant(all P>0.05).Conclusions:Domestic GenAI models exhibit a characteristic of"breadth over precision"in the differential diagnosis of weight loss,with DeepSeek-V3 performing better at key diagnostic levels.Although the sensitivity and precision for the top-ranked diagnosis require improvement,these models have the potential to serve as effective clinical decision support tools,broadening the diagnostic perspectives of general practitioners.

关键词

体重减轻/未分化疾病/鉴别诊断/人工智能/语言模型

Key words

Weight loss/Medically unspecified disease/Differential diagnosis/Artificial intelligence/Language model

分类

医药卫生

引用本文复制引用

刘颖,张云红,蔡东平,任菁菁..生成式人工智能在消瘦患者鉴别诊断中的辅助价值[J].浙江大学学报(医学版),2026,55(1):65-71,7.

基金项目

国家自然科学基金(72274169) (72274169)

浙江大学医学交叉前沿研究基金This study was supported by National Natural Science Foundation of China (72274169) and Medical Interdisciplinary Innovation Program of Zhejiang University School of Medicine (72274169)

浙江大学学报(医学版)

1008-9292

访问量0
|
下载量0
段落导航相关论文