考虑区域间辅助奖励的配电网电压优化控制OA北大核心CSTPCD
Voltage Optimization Control of Distribution Networks Considering Inter-Regional Auxiliary Rewards
智能软开关能够有效解决分布式光伏大规模接入配电网引起的电压波动问题,但会导致区域间协作程度加深,而现阶段使用多智能体深度强化学习算法进行电压优化时,各智能体仅使用各自区域内的奖励进行训练,导致智能体缺乏协同,输出策略难以保证最优性.为此提出考虑区域间辅助奖励的配电网电压优化方法,首先建立基于多智能体深度强化学习的多时间尺度电压优化框架,其次针对控制智能软开关的智能体,将各自区域内奖励定义为主奖励,邻近区域内奖励定义为辅助奖励,然后通过主、辅助奖励损失函数关于网络参数梯度的数量积分析辅助奖励对训练的有利程度,并采用演化博弈方法自适应修改辅助奖励参与因子;最后,在改进的IEEE 33 节点系统验证了所提方法能够稳定智能体训练过程,提升智能体策略的优化效果.
A soft open point can effectively solve the voltage fluctuation problem caused by the large-scale integration of distributed photovoltaics into a power distribution network.However,this can lead to increased collaboration between regions.Currently,when using multi-agent deep reinforcement learning algorithms for voltage optimization,each agent uses only rewards within its own region for training,resulting in a lack of coordination among agents and difficulty in guaranteeing the optimality of the output strategies.To address this problem,a method for voltage optimization in distribution networks that considers inter-regional auxiliary rewards was proposed.First,a multi-agent deep reinforcement learning framework based on multiple timescales was established for voltage optimization.Second,for agents controlling the soft open points,the rewards within their respective regions were defined as primary rewards,whereas the rewards from neighboring regions are defined as auxiliary rewards.The beneficial effect of auxiliary rewards on training was analyzed using the dot product of the primary and auxiliary reward loss functions with respect to the network parameter gradients.An adaptive modification of the auxiliary reward participation factor is implemented using an evolutionary game approach.Finally,the proposed method is validated in an improved IEEE 33 node system,which demonstrates stable training processes and improves strategy optimization for the agents.
周祥;李晓露;柳劲松;林顺富
上海电力大学电气工程学院,上海市 200090国网上海市电力公司电力科学研究院,上海市 200437
动力与电气工程
多智能体深度强化学习电压优化辅助奖励演化博弈参与因子
multi-agent deep reinforcement learningvoltage optimizationauxiliary rewardsevolutionary gameparticipation factor
《电力建设》 2024 (005)
80-93 / 14
This work is supported by National Natural Science Foundation of China(No.51977127). 国家自然科学基金项目(51977127)
评论