广东医学2026,Vol.47Issue(4):550-556,7.DOI:10.13820/j.cnki.gdyx.20254444
超轻量级DeepSeek-R1大语言模型在CT报告分类中的微调性能研究
Fine-tuning performance of an ultra-lightweight DeepSeek-R1 large language model for CT report classifica-tion
摘要
Abstract
Objective To address the clinical need for accurate severity stratification in CT report triage by fine-tuning an ultra-lightweight large language model(LLM),DeepSeek-R1-1.5B,and systematically evaluating its per-formance.Methods A total of 6,000 CT reports were retrospectively collected,including 5,000 reports for training,validation,and internal testing,and an additional 1,000 reports for independent external testing.Reports were annotated by experienced radiologists into three categories based on clinical significance:"negative,""routine"(common condi-tions),and"urgent"(severe conditions).Four lightweight models-DeepSeek-R1-1.5B,BERT-base-uncased,Qwen2.5-1.5B,and LLaMA3.2-1B-were fine-tuned and evaluated.An untrained full-scale model,DeepSeek-R1-671B,was included as a zero-shot baseline.Model performance was assessed using classification accuracy and cor-responding 95%confidence intervals(CIs).Results After fine-tuning,DeepSeek-R1-1.5B achieved the best per-formance,with accuracies of 0.964(95%CI:0.962-0.966)on the internal test set and 0.962(95%CI:0.960-0.963)on the external test set.Its performance was significantly superior to that of the other fine-tuned lightweight mod-els(P<0.001)and the zero-shot DeepSeek-R1-671B model(P<0.001).Subgroup analysis demonstrated consist-ently high performance across different scan types and anatomical regions.Conclusion The domain-adapted ultra-lightweight model DeepSeek-R1-1.5B demonstrates high accuracy in three-class severity classification of CT reports,highlighting its potential for clinical deployment in resource-constrained settings.关键词
大语言模型/医学文本分类/自然语言处理/模型微调Key words
large language model/medical text classification/natural language processing/model fine-tuning分类
信息技术与安全科学引用本文复制引用
吴以名,解学乾..超轻量级DeepSeek-R1大语言模型在CT报告分类中的微调性能研究[J].广东医学,2026,47(4):550-556,7.基金项目
国家自然科学基金资助项目(82472073) (82472073)