引用本文:刘硕,郭创新,冯斌,张勇,王艺博.基于价值分解深度强化学习的分布式光伏主动电压控制方法[J].电力自动化设备,2023,43(10):152-159
LIU Shuo,GUO Chuangxin,FENG Bin,ZHANG Yong,WANG Yibo.Active voltage control method of distributed photovoltaic based on value decomposition deep reinforcement learning[J].Electric Power Automation Equipment,2023,43(10):152-159
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 4153次   下载 749 本文二维码信息
码上扫一扫!
基于价值分解深度强化学习的分布式光伏主动电压控制方法
刘硕1, 郭创新1, 冯斌1, 张勇2, 王艺博2
1.浙江大学 电气工程学院,浙江 杭州 310027;2.国家电网有限公司华北分部,北京 100053
摘要:
针对主动电压控制问题,深度强化学习能够有效地解决数学优化方法在精确性和实时性方面的不足。但传统多智能体深度强化学习方法存在信用分配、过度泛化等问题,难以学习到全局最优的协调策略,控制效果较差。为此,提出了一种基于价值分解深度强化学习的分布式光伏主动电压控制方法。将主动电压控制问题建模为分布式部分可观测马尔可夫决策过程,然后基于中心化训练和去中心化执行框架,提出分解式价值网络、集中式策略梯度2项改进措施:将全局价值网络分解为个体价值网络和混合网络,并采用所有智能体的当前策略进行集中参数更新。改进的IEEE 33节点配电网系统的算例结果表明,所提方法表现出了优越的稳压减损控制性能,且在训练速度、场景鲁棒性等方面具备一定的优势。
关键词:  主动电压控制  分布式光伏  深度强化学习  多智能体  价值分解  集中式策略梯度
DOI:10.16081/j.epae.202309001
分类号:TM73
基金项目:国家电网公司科技项目(5100?20212570A?0?5?SF)
Active voltage control method of distributed photovoltaic based on value decomposition deep reinforcement learning
LIU Shuo1, GUO Chuangxin1, FENG Bin1, ZHANG Yong2, WANG Yibo2
1.College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;2.North China Branch of State Grid Corporation of China, Beijing 100053, China
Abstract:
Aiming at the problem of active voltage control, deep reinforcement learning can effectively solve the shortcomings of mathematical optimization methods in accuracy and real-time performance. However, the traditional multi-agent deep reinforcement learning method has some problems, such as credit assignment, over-generalization, and so on, so it is difficult to learn the global optimal coordination strategy and the control effect is poor. Therefore, an active voltage control method of distributed photovoltaic based on value decomposition deep reinforcement learning is proposed. The active voltage control problem is modeled as a decentralized partially observable Markov decision process, and then based on the centralized training with decentralized execution framework, two improvement measures are proposed, including decomposed value network and centralized policy gradient. The global value network is decomposed into individual value networks and a mixing network, and the current policies of all agents are used for centralized parameter updating. The numerical results of the improved IEEE 33-bus distribution network system show that, the proposed method shows superior voltage stabilization and loss reduction performance, and has certain advantages in training speed and scene robustness.
Key words:  active voltage control  distributed photovoltaic  deep reinforcement learning  multi-agent  value decomposition  centralized policy gradient

用微信扫一扫

用微信扫一扫