| 注册
首页|期刊导航|沈阳工业大学学报|基于融合XGBoost的变电工程造价数据预测算法

基于融合XGBoost的变电工程造价数据预测算法

周波 刘云 李维嘉 亓彦珣 王立功

沈阳工业大学学报2025,Vol.47Issue(3):317-323,7.
沈阳工业大学学报2025,Vol.47Issue(3):317-323,7.DOI:10.7688/j.issn.1000-1646.2025.03.07

基于融合XGBoost的变电工程造价数据预测算法

Algorithm for predicting cost data of substation engineering based on fused XGBoost

周波 1刘云 2李维嘉 3亓彦珣 3王立功4

作者信息

  • 1. 华北电力大学能源动力与机械工程学院,北京 102206||河北省电力有限公司经济技术研究院,河北石家庄 050001
  • 2. 河北省电力有限公司经济技术研究院,河北石家庄 050001||武汉大学电气与自动化学院,湖北武汉 430072
  • 3. 河北省电力有限公司经济技术研究院,河北石家庄 050001
  • 4. 河北赛克普泰计算机咨询服务有限公司软件造价部,河北石家庄 050081
  • 折叠

摘要

Abstract

[Objective]Traditional cost prediction methods for power grid substation engineering often rely on single influencing factors or linear assumption models,which fail to comprehensively capture the complex non-linear relationships among multiple factors,resulting in low prediction accuracy.Furthermore,existing methods face challenges such as dimensionality explosion or information loss when handling high-dimensional categorical variables,and especially,overfitting is prone to occur in small-sample datasets.Therefore,this study aims to develop a robust cost prediction model for substation engineering that effectively integrates multi-source influencing factors,adapts to non-linear relationships,and performs well in small-sample scenarios,thereby providing more accurate technical support for investment decisions in power grid enterprises.[Methods]To address these issues,a substation engineering cost prediction model(ME-XGB)based on the fusion of mean encoding(ME)and the extreme gradient boosting(XGBoost)framework was proposed.First,13 key influencing factors were extracted from dimensions such as equipmentand materials,construction techniques,construction scale,geographical environment,and design standards,covering both categorical and continuous variables.For categorical variables exhibiting non-linear relationships with cost,ME was applied for feature engineering.This method converted categorical variables into continuous features by calculating the mean of the target variable(cost per unit capacity)within each category and combining with a smoothing factor to retain category information while avoiding dimensionality explosion.Second,the XGBoost algorithm was utilized to construct the prediction model.The generalization ability of the model was enhanced by integrating multiple decision trees to iteratively correct residuals and incorporating regularization terms and hyperparameter tuning.Experiments were conducted using 200 substation engineering samples from a power grid company,which were randomly divided into a training set(80%)and a test set(20%).The performance of ME-XGB was compared with MK-TESM based on a Mann-Kendall(MK)trend test method and a three exponential smoothing method(TESM),backpropagation(BP)neural network,and the original XGBoost model by using mean absolute error(MAE)and goodness of fit(R2)as evaluation metrics.[Results]Experimental results demonstrate that the ME-XGB model significantly outperforms comparative models in prediction accuracy on the test set.Specifically,the median and mean MAE values of ME-XGB are 5 and 6.875,where are lower than those of MK-TESM,BP neural network,and the original XGBoost.Additionally,the R2 value of ME-XGB reaches 0.857 9,significantly higher than those of the other models,indicating stronger explanatory power for data variations.Boxplot analysis further reveals that ME-XGB has the narrowest distribution range of prediction errors,confirming its greater stability.Hyperparameter tuning results show that settings of hyperparameters such as tree depth and learning rate effectively balance model complexity and overfitting risks.[Conclusion]The proposed ME-XGB model addresses the challenges of non-linear representation and dimensionality control for categorical variables through ME,while leveraging the ensemble learning capability of XGBoost to significantly enhance prediction performance in small-sample scenarios.ME-XGB outperforms traditional models in terms of MAE,R2,and error stability,providing a more reliable cost prediction tool for power grid enterprises.Future research can further explore the modeling of dynamic influencing factors and extend the application of the model to cross-regional projects through transfer learning.

关键词

变电工程/造价预测/非线性/影响因子/极端梯度提升/均值编码/融合框架/特征工程

Key words

substation engineering/cost prediction/non-linearity/influencing factor/extreme gradient boosting/mean encoding/fusion framework/feature engineering

分类

信息技术与安全科学

引用本文复制引用

周波,刘云,李维嘉,亓彦珣,王立功..基于融合XGBoost的变电工程造价数据预测算法[J].沈阳工业大学学报,2025,47(3):317-323,7.

基金项目

河北省自然科学基金重点项目(E2018210044) (E2018210044)

河北省教育厅科技项目(QN16214510D). (QN16214510D)

沈阳工业大学学报

OA北大核心

1000-1646

访问量0
|
下载量0
段落导航相关论文