引用本文:梁涛,柴露露,谭建鑫,井延伟,吕梁年.基于深度强化学习算法的氢耦合电-热综合能源系统优化调度[J].电力自动化设备,2025,45(1):59-66
LIANG Tao,CHAI Lulu,TAN Jianxin,JING Yanwei,Lü Liangnian.Optimal scheduling of hydrogen coupled electrothermal integrated energy system based on deep reinforcement learning algorithm[J].Electric Power Automation Equipment,2025,45(1):59-66
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 7480次   下载 1161 本文二维码信息
码上扫一扫!
基于深度强化学习算法的氢耦合电-热综合能源系统优化调度
梁涛1, 柴露露1, 谭建鑫2, 井延伟2, 吕梁年3
1.河北工业大学 人工智能与数据科学学院,天津 300401;2.河北建投新能源有限公司,河北 石家庄 050011;3.金风科技股份有限公司,北京 102600
摘要:
为了促进氢能与综合能源系统中其他能源的耦合,提高能源利用灵活性,减少系统碳排放,提出了一种氢耦合电-热综合能源系统(HCEH-IES)的运行优化方法。对HCEH-IES的各设备进行数学建模,并深入阐述深度强化学习算法的基本原理及双延迟深度确定性策略梯度(TD3)算法的流程;将HCEH-IES的不确定性优化调度问题转化为马尔可夫决策过程,并采用TD3算法将优化目标以及约束条件转换为奖励函数进行连续状态空间和动作空间下的动态调度决策,形成合理的能源分配管理方案;采用历史数据对智能体进行训练,并对比深度Q学习网络和深度确定性策略梯度算法获得的调度策略。结果表明,相较于深度Q学习网络和深度确定性策略梯度算法,基于TD3算法的调度策略具有更好的经济性,其结果更接近于CPLEX日前优化调度方法的经济成本且更适用于解决综合能源系统动态优化调度问题,有效地实现了能源灵活利用,提高了综合能源系统的经济性和低碳性。
关键词:  氢耦合电-热综合能源系统  可再生能源  深度强化学习  双延迟深度确定性策略梯度  能量优化管理  马尔可夫决策过程
DOI:10.16081/j.epae.202405010
分类号:TM73;TK01;TK91
基金项目:国家自然科学基金资助项目(2023YFB3407703);河北省科技支撑计划项目(F2021202022)
Optimal scheduling of hydrogen coupled electrothermal integrated energy system based on deep reinforcement learning algorithm
LIANG Tao1, CHAI Lulu1, TAN Jianxin2, JING Yanwei2, Lü Liangnian3
1.School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China;2.Hebei Jiantou New Energy Co.,Ltd.,Shijiazhuang 050011, China;3.Goldwind Science & Technology Co.,Ltd.,Beijing 102600, China
Abstract:
In order to promote the coupling of hydrogen energy with other energy sources in the integrated energy system, improve the flexibility of energy utilization and reduce the carbon emission of the system, an operation optimization method of hydrogen coupled electrothermal integrated energy system(HCEH-IES) is proposed. The mathematical model of each device in the HCEH-IES is established, and the basic principle of deep reinforcement learning algorithm and the process of twin delayed deep deterministic policy gradient(TD3) algorithm are described in detail. The uncertain optimal scheduling problem of HCEH-IES is transformed into Markov decision process, and the TD3 algorithm is used to convert optimization objective and constraints into reward functions for dynamic scheduling decision-making in continuous state space and action space, then a reasonable energy distribution management scheme is formed. The agents are trained with historical data, and the scheduling strategies obtained by deep Q learning network and deep deterministic policy gradient algorithm are compared. The results show that, compared with the deep Q learning network and the deep deterministic policy gradient algorithm, the scheduling strategy based on TD3 algorithm is more econo-mic, and its results are closer to the economic cost of the CPLEX-based day-ahead optimal scheduling method, and it is more suitable to solve the dynamic optimal scheduling problem of the integrated energy system, which effectively realizes the flexible utilization of energy and improves the economy and low-carbon performance of the integrated energy system.
Key words:  HCEH-IES  renewable energy  deep reinforcement learning  twin delayed deep deterministic policy gradient  energy optimization management  Markov decision process

用微信扫一扫

用微信扫一扫