《电力自动化设备》

引用本文:	刘硕,郭创新,冯斌,张勇,王艺博.基于价值分解深度强化学习的分布式光伏主动电压控制方法[J].电力自动化设备,2023,43(10):152-159
	LIU Shuo,GUO Chuangxin,FENG Bin,ZHANG Yong,WANG Yibo.Active voltage control method of distributed photovoltaic based on value decomposition deep reinforcement learning[J].Electric Power Automation Equipment,2023,43(10):152-159

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【EndNote】【RefMan】【BibTex】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 4153次下载 749次	码上扫一扫！
字体:加大+\|默认\|缩小-
基于价值分解深度强化学习的分布式光伏主动电压控制方法
刘硕¹, 郭创新¹, 冯斌¹, 张勇², 王艺博²
1.浙江大学电气工程学院，浙江杭州 310027;2.国家电网有限公司华北分部，北京 100053

摘要:

针对主动电压控制问题，深度强化学习能够有效地解决数学优化方法在精确性和实时性方面的不足。但传统多智能体深度强化学习方法存在信用分配、过度泛化等问题，难以学习到全局最优的协调策略，控制效果较差。为此，提出了一种基于价值分解深度强化学习的分布式光伏主动电压控制方法。将主动电压控制问题建模为分布式部分可观测马尔可夫决策过程，然后基于中心化训练和去中心化执行框架，提出分解式价值网络、集中式策略梯度2项改进措施：将全局价值网络分解为个体价值网络和混合网络，并采用所有智能体的当前策略进行集中参数更新。改进的IEEE 33节点配电网系统的算例结果表明，所提方法表现出了优越的稳压减损控制性能，且在训练速度、场景鲁棒性等方面具备一定的优势。

关键词: 主动电压控制分布式光伏深度强化学习多智能体价值分解集中式策略梯度

DOI：10.16081/j.epae.202309001

分类号:TM73

基金项目:国家电网公司科技项目(5100?20212570A?0?5?SF)

Active voltage control method of distributed photovoltaic based on value decomposition deep reinforcement learning

LIU Shuo¹, GUO Chuangxin¹, FENG Bin¹, ZHANG Yong², WANG Yibo²

1.College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;2.North China Branch of State Grid Corporation of China, Beijing 100053, China

Abstract:

Aiming at the problem of active voltage control, deep reinforcement learning can effectively solve the shortcomings of mathematical optimization methods in accuracy and real-time performance. However, the traditional multi-agent deep reinforcement learning method has some problems, such as credit assignment, over-generalization, and so on, so it is difficult to learn the global optimal coordination strategy and the control effect is poor. Therefore, an active voltage control method of distributed photovoltaic based on value decomposition deep reinforcement learning is proposed. The active voltage control problem is modeled as a decentralized partially observable Markov decision process, and then based on the centralized training with decentralized execution framework, two improvement measures are proposed, including decomposed value network and centralized policy gradient. The global value network is decomposed into individual value networks and a mixing network, and the current policies of all agents are used for centralized parameter updating. The numerical results of the improved IEEE 33-bus distribution network system show that, the proposed method shows superior voltage stabilization and loss reduction performance, and has certain advantages in training speed and scene robustness.

Key words: active voltage control distributed photovoltaic deep reinforcement learning multi-agent value decomposition centralized policy gradient

用微信扫一扫