铁道科学与工程学报2025,Vol.22Issue(1):77-88,12.DOI:10.19713/j.cnki.43-1423/u.T20240439
考虑时空关联的道路行程速度稀疏数据修复与解释性算法
Restoration and interpretive algorithm for sparse road travel speed data considering spatiotemporal correlation
摘要
Abstract
To study the coupling effect between sparse data segment travel speed and its spatially associated roads in topological road networks,based on the spatial distance distribution in the road network,the definition and calculation method of Road Spatial Correlation Index(RSCI)were clarified.A sparse data repair and interpretability model for road travel speed was constructed.Firstly,based on the traditional roulette wheel algorithm,an improved genetic algorithm(IGA)was proposed for selection operations and operators,which utilized an adaptive mechanism to optimize individual selection probability.By setting a constant λ to solve the problem of low selection probability for subsequent excellent individuals,the convergence performance of the model was improved.Secondly,IGA and K-Fold Cross Validation(K-Fold CV)were used to achieve the optimization of the n-estimators,Learning-rate,Min_child_weight,Max_depth hyperparameters in the Extreme Gradient Boosting(XGBoost)algorithm.Then,SHAP(ShaFold Cross Validation,K-Fold CV)was used to optimize the model.The Pey Additive Explanation(SHAP)method provided global interpretation and individual sample tracing analysis of the importance of each feature in the XGBoost model.Finally,an example verification was conducted using the target road travel speed as the output and the connected road travel speed as the feature input.The results show that the MAE and RMSE of the IGA-XGBoost combination algorithm are 1.95 and 2.66,respectively,with an R2 of 0.941,which is 0.4%higher than GA-XGBoost.The model running time is 1.532 seconds,which is 7.6%lower than GA-XGBoost.The combination algorithm has higher prediction accuracy and significantly improves iteration efficiency.Under the calibration of feature importance using SHAP values,the importance of connected road features is positively correlated with their RSCI.The larger the RSCI value,the greater the contribution of connected roads to the prediction results.When the number of connecting roads is insufficient and the top 3 connecting roads with SHAP values are used to fill in the target road data,the fMAE,fRMSE,and R2 of the model are 2.53,3.30 and 0.905,respectively,which can still achieve good data restoration accuracy,proving the applicability of the method.The research results can provide new ideas for repairing and filling in urban road travel speed data.关键词
智能交通/稀疏数据修复/改进遗传算法/XGBoost/SHAP算法Key words
intelligent transportation/sparse data repair/improved genetic algorithm/XGBoost/SHAP algorithm分类
交通工程引用本文复制引用
徐韬,任其亮,张磊,程龙春..考虑时空关联的道路行程速度稀疏数据修复与解释性算法[J].铁道科学与工程学报,2025,22(1):77-88,12.基金项目
国家社会科学基金资助项目(21BJY038) (21BJY038)
四川省科技创新合作项目(2020YFH0038) (2020YFH0038)
教育部人文社会科学基金青年基金资助项目(20XJCZH011) (20XJCZH011)
重庆设计集团有限公司2023年度科研项目(2023-A2) (2023-A2)