《电力自动化设备》

引用本文:	梁涛,柴露露,谭建鑫,井延伟,吕梁年.基于深度强化学习算法的氢耦合电-热综合能源系统优化调度[J].电力自动化设备,2025,45(1):59-66
	LIANG Tao,CHAI Lulu,TAN Jianxin,JING Yanwei,Lü Liangnian.Optimal scheduling of hydrogen coupled electrothermal integrated energy system based on deep reinforcement learning algorithm[J].Electric Power Automation Equipment,2025,45(1):59-66

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【EndNote】【RefMan】【BibTex】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 7480次下载 1161次	码上扫一扫！
字体:加大+\|默认\|缩小-
基于深度强化学习算法的氢耦合电-热综合能源系统优化调度
梁涛¹, 柴露露¹, 谭建鑫², 井延伟², 吕梁年³
1.河北工业大学人工智能与数据科学学院，天津 300401;2.河北建投新能源有限公司，河北石家庄 050011;3.金风科技股份有限公司，北京 102600

摘要:

为了促进氢能与综合能源系统中其他能源的耦合，提高能源利用灵活性，减少系统碳排放，提出了一种氢耦合电-热综合能源系统(HCEH-IES)的运行优化方法。对HCEH-IES的各设备进行数学建模，并深入阐述深度强化学习算法的基本原理及双延迟深度确定性策略梯度(TD3)算法的流程；将HCEH-IES的不确定性优化调度问题转化为马尔可夫决策过程，并采用TD3算法将优化目标以及约束条件转换为奖励函数进行连续状态空间和动作空间下的动态调度决策，形成合理的能源分配管理方案；采用历史数据对智能体进行训练，并对比深度Q学习网络和深度确定性策略梯度算法获得的调度策略。结果表明，相较于深度Q学习网络和深度确定性策略梯度算法，基于TD3算法的调度策略具有更好的经济性，其结果更接近于CPLEX日前优化调度方法的经济成本且更适用于解决综合能源系统动态优化调度问题，有效地实现了能源灵活利用，提高了综合能源系统的经济性和低碳性。

关键词: 氢耦合电-热综合能源系统可再生能源深度强化学习双延迟深度确定性策略梯度能量优化管理马尔可夫决策过程

DOI：10.16081/j.epae.202405010

分类号:TM73;TK01;TK91

基金项目:国家自然科学基金资助项目(2023YFB3407703)；河北省科技支撑计划项目(F2021202022)

Optimal scheduling of hydrogen coupled electrothermal integrated energy system based on deep reinforcement learning algorithm

LIANG Tao¹, CHAI Lulu¹, TAN Jianxin², JING Yanwei², Lü Liangnian³

1.School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China;2.Hebei Jiantou New Energy Co.,Ltd.,Shijiazhuang 050011, China;3.Goldwind Science & Technology Co.,Ltd.,Beijing 102600, China

Abstract:

In order to promote the coupling of hydrogen energy with other energy sources in the integrated energy system, improve the flexibility of energy utilization and reduce the carbon emission of the system, an operation optimization method of hydrogen coupled electrothermal integrated energy system(HCEH-IES) is proposed. The mathematical model of each device in the HCEH-IES is established, and the basic principle of deep reinforcement learning algorithm and the process of twin delayed deep deterministic policy gradient(TD3) algorithm are described in detail. The uncertain optimal scheduling problem of HCEH-IES is transformed into Markov decision process, and the TD3 algorithm is used to convert optimization objective and constraints into reward functions for dynamic scheduling decision-making in continuous state space and action space, then a reasonable energy distribution management scheme is formed. The agents are trained with historical data, and the scheduling strategies obtained by deep Q learning network and deep deterministic policy gradient algorithm are compared. The results show that, compared with the deep Q learning network and the deep deterministic policy gradient algorithm, the scheduling strategy based on TD3 algorithm is more econo-mic, and its results are closer to the economic cost of the CPLEX-based day-ahead optimal scheduling method, and it is more suitable to solve the dynamic optimal scheduling problem of the integrated energy system, which effectively realizes the flexible utilization of energy and improves the economy and low-carbon performance of the integrated energy system.

Key words: HCEH-IES renewable energy deep reinforcement learning twin delayed deep deterministic policy gradient energy optimization management Markov decision process

用微信扫一扫