引用本文:张文昕,栗然,臧向迪,严敬汝,祝晋尧.基于强化学习的电动汽车换电站实时调度策略优化[J].电力自动化设备,2022,42(10):
ZHANG Wenxin,LI Ran,ZANG Xiangdi,YAN Jingru,ZHU Jinyao.Real-time scheduling strategy optimization for electric vehicle battery swapping station based on reinforcement learning[J].Electric Power Automation Equipment,2022,42(10):
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 3492次   下载 1236  
基于强化学习的电动汽车换电站实时调度策略优化
张文昕1, 栗然1, 臧向迪1, 严敬汝2, 祝晋尧3
1.华北电力大学 电气与电子工程学院,河北 保定 071003;2.国网河北省电力有限公司电力科学研究院,河北 石家庄 050022;3.国网石家庄供电公司,河北 石家庄 050004
摘要:
随着电动汽车的应用推广,换电站的调度优化逐渐成为研究热点。传统的基于换电需求预测值的调度策略在实际应用中面临着难以适应动态干扰因素、预测误差累积等问题。为了解决这些问题,提出了一种基于带基线的蒙特卡罗策略梯度法的换电站实时调度策略,用于优化换电站的充放电策略以及响应电池数量。提出了带基线的蒙特卡罗策略梯度强化学习,并为换电站实时调度问题选取合适的状态空间和动作空间;设计了奖励函数对智能体进行离线训练,从电池状态数据、分时电价和排队电动汽车数量中学习得到最优策略网络;在离线训练好的模型基础上进行实时调度策略测试。基于换电站的服务可用率和经济效益验证了所提调度策略的有效性和经济性,算例结果表明所提策略能对电网负荷起到一定的削峰填谷作用。
关键词:  电动汽车  换电站  强化学习  策略梯度  分时电价  实时调度
DOI:10.16081/j.epae.202203003
分类号:U469.72;TM734
基金项目:
Real-time scheduling strategy optimization for electric vehicle battery swapping station based on reinforcement learning
ZHANG Wenxin1, LI Ran1, ZANG Xiangdi1, YAN Jingru2, ZHU Jinyao3
1.College of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071003, China;2.Electric Power Research Institute of State Grid Hebei Electric Power Company, Shijiazhuang 050022, China;3.State Grid Shijiazhuang Electric Power Company, Shijiazhuang 050004, China
Abstract:
With the application and promotion of electric vehicles, the scheduling optimization of battery swapping stations has gradually become a research focus. The traditional scheduling strategies based on the predicted values of swapping demand are faced with some problems in practical application, such as being difficult to adapt to dynamic interference factors and accumulation of prediction errors. In order to solve these problems, a real-time scheduling strategy of battery swapping station based on Monte Carlo policy gradient method with baseline is proposed to optimize the charging and discharging strategy and the number of response batteries of battery swapping station. Monte Carlo policy gradient reinforcement learning with baseline is proposed, and the appropriate state space and action space are selected for real-time scheduling of battery swapping station. The reward function is designed to train the agent off-line, and the optimal strategy network is learned from the battery state data, the time-of-use electricity price and the number of queuing electric vehicles. The real-time scheduling strategy is tested on the basis of off-line trained model. The effectiveness and economy of the proposed scheduling strategy are verified based on battery swapping station’s service availability and economic benefit. The results of an example show that the proposed strategy can play a certain role in peak load shifting of power grid.
Key words:  electric vehicles  battery swapping station  reinforcement learning  policy gradient  time-of-use electricity price  real-time scheduling

用微信扫一扫

用微信扫一扫