Abstract
Background Artificial intelligence(AI)technologies have shortcomings like vulnerability to adversarial attacks and overfitting,which make AI far less perfect in practical applications than experimental data.Considering factors such as the cost and timeliness of remote electrocardiography(ECG)consultation,primary medical staff will directly use AI diagnosis results,which may pose medical risks.Objective To analyze the accuracy and the influence factor of AI technology in ECG diagnosis based on the 3-year consultation data of Huangshan Regional Remote ECG Diagnostic Center.Methods A retrospective collection of 18 164 ECGs from primary care institutions was conducted at the Huangshan City Regional Remote ECG Diagnosis Center between September 2020 and September 2023.Both AI and physicians categorized the ECG diagnostic conclusions into four types:normal,positive,critical,and poor acquisition.Patient identity information was linked to the inpatient electronic medical record system of Huangshan City People's Hospital to extract discharge diagnosis information from tertiary hospitals.Patients were classified into two groups based on discharge diagnosis:cardiovascular disease(CVD)hospitalization and non-CVD hospitalization.A paired-design McNemar χ2 test was used to compare the classification differences between the AI and physician groups.A Pearson χ2 test was used to compare the differences between the classification results of both groups and CVD hospitalization status.After excluding cases with poor acquisition,a univariate logistic regression analysis was performed to analyze the correlation between different classifications and CVD hospitalization,using the normal category as a reference.Furthermore,17 ECG indicators from the physician group(excluding poor acquisition cases)were converted into binary variables.Using the consistency of classification between the AI and physician groups as the dependent variable,receiver operating characteristic(ROC)curves were plotted for stratified analysis of the physician group's critical and positive categories to evaluate the impact of each ECG indicator on the inconsistency between the two groups.Results A total of 18 164 remote routine ECGs were included in the study.The median patient age was 69(65,74)years,with 8 731 males and 9 433 females.The physician group classified ECGs as normal in 5 873 cases(32.3%),positive in 11 678(64.3%),critical in 393(2.2%),and poor acquisition in 220(1.2%).The corresponding figures for the AI group were 4 723(26.0%),12 861(70.8%),390(2.1%),and 190(1.0%),respectively.During the study period,553 related patients were transferred to tertiary hospitals for inpatient care,of which 457(82.6%)were for CVD.Univariate logistic regression analysis showed that,with normal ECG as a reference,the risk of CVD hospitalization for the physician group's positive and critical categories was 1.84 times(OR=1.84,95%CI=1.11-3.04)and 2.80 times(OR=2.80,95%CI=1.08-7.21)that of the normal category,respectively,with statistically significant differences(P<0.05).The AI group's positive(OR=1.54,95%CI=0.88-2.67)and critical(OR=2.46,95%CI=0.92-6.55)categories showed no statistically significant association with CVD hospitalization(P>0.05).The diagnostic classification was consistent between the AI and physician groups in 16 018 cases(88.2%)and inconsistent in 2 146 cases(11.8%),with a statistically significant difference between the two groups(χ2=680.931,P<0.001).Using the physician group as the standard,the AI group had a misdiagnosis rate of 27.7%and a missed diagnosis rate of 3.9%.ROC curve results indicated that for the physician group's critical ECGs,sinus rhythm,ST-segment abnormalities,and acute myocardial ischemia had discriminative value for the inconsistency between the two groups,with areas under the curve(AUC)of 0.74(95%CI=0.65-0.82),0.69(95%CI=0.58-0.80),and 0.97(95%CI=0.96-0.99),respectively.For the physician group's positive ECG,low voltage and T-wave abnormalities had discriminative value,with AUC of 0.58(95%CI=0.55-0.61)and 0.61(95%CI=0.58-0.63),respectively.For the physician group's normal ECG,bradycardia had discriminative value,with an AUC of 0.58(95%CI=0.56-0.60).Conclusion The accuracy of current AI algorithm in ECG diagnosis is inferior to physician group,which still needs to be reviewed and confirmed by experienced physicians.We propose that AI technology applied in clinic should undergo extensive robustness verification.关键词
心血管疾病/人工智能/远程心电图/诊断技术Key words
Cardiovascular disease/Artificial intelligence/Remote electrocardiography/Diagnostic techniques分类
医药卫生