交通运输研究2025,Vol.11Issue(2):65-80,16.DOI:10.16503/j.cnki.2095-9931.2025.02.006
自动驾驶事故数据挖掘与典型危险场景构建
Autonomous Driving Accident Data Mining and Typical Dangerous Scenario Construction
摘要
Abstract
To construct a scientific and reasonable urban road autonomous driving test scenario,based on 280 AV(Autonomous Vehicle)collision reports published by the California DMV(Department of Motor Vehicles)from 2021 to 2023,typical dangerous scenarios were excavated and converted into test scenarios.Firstly,multivariate logistic regression analysis was conducted to extract significant influencing factors of personnel injury.Secondly,One-hot encoding was introduced to transform categorical variables in-to binary vectors,eliminating numerical order bias inherent in traditional label encoding.Subsequently,a two-step clustering algorithm was applied to mine typical dangerous scenario clusters,with cross tabulation further analyzing the correlations between scenario clusters and accident outcomes as well as road environment variables.Finally,the identified dangerous scenarios were systematically converted into autonomous driving test scenarios.Results demonstrated that,One-hot encoding im-proved clustering quality by 50%compared with conventional method;cluster analysis identified 12 typical dangerous scenarios,with cross-tabulation analysis revealing statistically significant associa-tions between scenario clusters and both accident outcomes and road environment variables.Further combining accident mechanisms and testing requirements,these 12 dangerous scenarios were consoli-dated into 6 representative test scenarios,with"rear-end collisions occurring when AV was in station-ary or decelerating state struck by following vehicles"being the most typical,accounting for 46.1%of all scenarios.The findings indicate that One-hot encoding significantly enhances the accuracy of clus-tering analysis,and the scenario clustering method based on real accident data can effectively identify urban road accident patterns for AV,and provide data-driven support for prioritizing and standardizing autonomous driving test scenario libraries.关键词
交通安全/测试场景/二阶聚类/自动驾驶汽车/多元Logistic回归/交叉表分析Key words
traffic safety/test scenarios/two-step cluster analysis/autonomous vehicle/multivari-ate logistic regression/cross-tabulation analysis分类
交通运输引用本文复制引用
王秀杰,田浩浩..自动驾驶事故数据挖掘与典型危险场景构建[J].交通运输研究,2025,11(2):65-80,16.基金项目
广西科技大学博士基金项目(校科博24Z03) (校科博24Z03)