摘要
Abstract
Objective To optimize breast ultrasound mass classification algorithms using large language models(LLM)to enhance the mass classification performance of breast ultrasound.Methods Firstly,totally 252 patients in the breast ultrasound dataset(BrEaST,v1.0)including 98 malignant cases and 154 benign cases had their breast ultrasound descrip-tions analyzed in terms of 8 characteristics of breast tissue composition,skin thickening,mass morphology,posterior echo,margin,acoustic shadowing,echo intensity and calcification based on the Breast Imaging Reporting and Data System(BI-RADS).A training set and a test set were established with the 252 patients at a 7∶3 ratio.A large language model(ChatGPT 5.1 Thinking)was used to generate Python codes automatically.There were three algorithms involved in the investigation:a random forest with preset hyperparameters(Algorithm 1,serving as the baseline algorithm),a random forest with preset hyperparameters combined with the synthetic minority oversampling technique for nominal(SMOTEN)(Algorithm 2)and a random forest optimized via random search(Algorithm 3).Using pathological examination results as the gold standard,the three algorithms were compared in terms of overall differences by the Friedman test and pairwise differences by the Nemenyi test.Artificial programming was carried out for the replication of the three algorithms,and the test set underwent 1 000 resamples using the Bootstrap method.The manual-programming based algorithms and LLM-based algorithms were compered in terms of performance metrics.Results The Friedman test results indicated that Algorithm 3 achieved the highest accuracy(0.848),sensitivity(0.912),F1 score(0.823)and AUC(0.895)across all four evaluation metrics,and the three algorithms were significantly different in accuracy,sensitivity,F1 score and AUC(P<0.05)while not in specificity(P>0.05).The Nemenyi test results showed that Algorithm 3 behaved better than Algorithm 1 and 2 in accuracy,sensitivity,F1 score and AUC significantly(P<0.05)and Algorithm 1 and 2 had no statistically significant differences in all the indexes(P>0.05).The algorithms based on LLM and manual programming had high consistency across all performance metrics,with no statistically significant differences observed in all the indexes(P>0.05).Conclusion The breast ultrasound mass classification algorithms can be enhanced based on LLM,and references are provided for the diagnosis and treatment by clinicians.[Chinese Medical Equipment Journal,2026,47(1):8-13].关键词
大型语言模型/乳腺超声/乳腺肿物/分类算法/代码生成/机器学习Key words
large language model/breast ultrasound/breast mass/classification algorithm/code generation/machine learning分类
医药卫生