Proximal Policy Optimization-based Bidding Strategy for Thermal Power Generators Participating in Energy and Frequency Regulation Markets

ZHANG Bin; CAO Fan; XIAO Kun; SONG Yin; GUO Ying; YE Yujian; XU Dezhi

doi:10.12204/j.issn.1000-7229.2026.04.007

PDF(1808 KB)

Electric Power Construction ›› 2026, Vol. 47 ›› Issue (4) : 82-92. DOI: 10.12204/j.issn.1000-7229.2026.04.007

Research and Application of AI Technology in the Market Mechanism and Operation Optimization of New-type Power System·Hosted by LI Yanbin, ZHANG Shuo, DONG Fugui, ZENG Bo·

Proximal Policy Optimization-based Bidding Strategy for Thermal Power Generators Participating in Energy and Frequency Regulation Markets

ZHANG Bin ¹ ,
CAO Fan ² ,
XIAO Kun ¹ ,
SONG Yin ² ,
GUO Ying ² ,
YE Yujian ¹ ,
XU Dezhi ¹

Author information +

History +

Abstract

[Objective] With China’s ongoing electricity market reforms and the pursuit of carbon peaking and neutrality goals， renewable energy penetration in the power system is rapidly increasing. While supporting clean energy transition， this also introduces marked electricity price volatility and market uncertainty， highly complicating the development of bidding strategies by power producers， relying in particular on spot trading. In response to the development of the optimal bidding strategies by traditional thermal power enterprises and diverse energy market players in the joint energy and frequency regulation ancillary services market， a bidding strategy optimization method based on proximal policy optimization （PPO） is proposed. [Methods] First， a bi-level optimization model is established for the joint energy-frequency regulation market， integrating multiple generation types and renewable energy storage， where storage smooths price fluctuations through charge-discharge control， elevating the risk response capability of market players such as wind-storage unions. In this framework， the upper-level power producers develop bidding strategies aiming at profit maximization， while the lower-level market clearing model achieves joint dispatch with the objective of minimizing system operating costs. Second， the bidding problem is formulated as a Markov decision process （MDP） within a deep reinforcement learning （DRL） framework， where PPO algorithm is employed to achieve autonomous learning and dynamic optimization of bidding strategies. [Results] Comparative analysis against the theoretical optimal solution in typical cases demonstrates that the proposed approach effectively boosts thermal power enterprises’ revenues， mitigates the risks resulting from renewable energy price fluctuations， reduces system operating costs， and enhances frequency regulation efficiency. [Conclusions] The proposed approach demonstrates superior economic performance and higher real-time computational efficiency in a joint market compared with benchmark solutions.

Key words

power generator bidding / electricity market risk response / deep reinforcement learning（DRL） / proximal policy optimization（PPO） / actor-critic architecture

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

ZHANG Bin , CAO Fan , XIAO Kun , et al . Proximal Policy Optimization-based Bidding Strategy for Thermal Power Generators Participating in Energy and Frequency Regulation Markets[J]. Electric Power Construction. 2026, 47(4): 82-92 https://doi.org/10.12204/j.issn.1000-7229.2026.04.007

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]

周孝信, 赵强, 张玉琼, 等. “双碳”目标下我国能源电力系统发展趋势分析: 绿电替代与绿氢替代[J]. 中国电机工程学报, 2024, 44(17): 6707-6721.

ZHOU

Xiaoxin

, ZHAO

Qiang

, ZHANG

Yuqiong

, et al. Analysis of the development trend of China’s energy and power system under the dual carbon target: green electricity substitution and green hydrogen substitution[J]. Proceedings of the CSEE, 2024, 44(17): 6707-6721.

Cited in this article [1]

[2]

魏旭, 刘东, 高飞, 等. “双碳”目标下考虑源网荷储协同优化运行的新型电力系统发电规划[J]. 电网技术, 2023, 47(9): 3648-3658.

WEI

, LIU

Dong

, GAO

Fei

, et al. Generation expansion planning of new power system considering collaborative optimal operation of source-grid-load-storage under carbon peaking and carbon neutrality[J]. Power System Technology, 2023, 47(9): 3648-3658.

Cited in this article [1]

[3]	辛永. “双碳”目标下的电网数字化转型技术研究与应用[J]. 供用电, 2023, 40(11): 1. XIN Yong. Research and application of digital transformation technology of power grid under the goal of “double carbon”[J]. Distribution & Utilization, 2023, 40(11): 1. Cited in this article [1]

[4]

杨知方, 王娴琳, 李琪瑞. 目标导向的电力市场机制设计: 基本框架与案例分析[J/OL]. 中国电机工程学报, 2025: 1-12. (2025-08-20) [2025-09-10]. https://doi.org/10.13334/j.0258-8013.pcsee.250312.

https://doi.org/10.13334/j.0258-8013.pcsee.250312

YANG

Zhifang

, WANG

Xianlin

, LI

Qirui

. Goal-oriented electricity market mechanism design: fundamental framework and case studies[J/OL]. Proceedings of the CSEE, 2025: 1-12. (2025-08-20) [2025-09-10]. https://doi.org/10.13334/j.0258-8013.pcsee.250312.

https://doi.org/10.13334/j.0258-8013.pcsee.250312

Cited in this article [1]

[5]

张硕, 陈媛丽, 李英姿, 等. 计及电力现货机会成本的构网型储能电站调频辅助服务竞价出清双层博弈模型[J]. 中国电机工程学报, 2024, 44(S1): 146-158.

ZHANG

Shuo

, CHEN

Yuanli

, LI

Yingzi

, et al. A bi-level game model for frequency regulation ancillary service bidding and clearing of grid-forming energy storage stations considering opportunity cost of electricity spot trading[J]. Proceedings of the CSEE, 2024, 44(S1): 146-158.

Cited in this article [1]

[6]	国家发展改革委国家能源局关于建立健全电力辅助服务市场价格机制的通知[EB/OL]. (2024-02-07) [2025-09-10]. https://www.gov.cn/zhengce/zhengceku/202402/content_6931026.htm. https://www.gov.cn/zhengce/zhengceku/202402/content_6931026.htm Cited in this article [1]

[7]

王傲儿, 赵书强, 宋金历, 等. 考虑新能源与储能参与调频的联合市场出清模型[J]. 太阳能学报, 2024, 45(3): 367-376.

WANG

Aoer

, ZHAO

Shuqiang

, SONG

Jinli

, et al. Joint market clearing model considering participation of new energy and energy storage in frequency modulation[J]. Acta Energiae Solaris Sinica, 2024, 45(3): 367-376.

Cited in this article [1]

[8]

唐成鹏, 张粒子, 刘方, 等. 基于多智能体强化学习的电力现货市场定价机制研究(一): 不同定价机制下发电商报价双层优化模型[J]. 中国电机工程学报, 2021, 41(2): 536-552.

TANG

Chengpeng

, ZHANG

Lizi

, LIU

Fang

, et al. Research on pricing mechanism of electricity spot market based on multi-agent reinforcement learning (part I): bi-level optimization model for generators under different pricing mechanisms[J]. Proceedings of the CSEE, 2021, 41(2): 536-552.

Cited in this article [1]

[9]

, SHI

, QU

X L

. Modeling methods for GenCo bidding strategy optimization in the liberalized electricity spot market: a state-of-the-art review[J]. Energy, 2011, 36(8): 4686-4700.

https://doi.org/10.1016/j.energy.2011.06.015

https://linkinghub.elsevier.com/retrieve/pii/S0360544211003926

Cited in this article [1]

[10]	张晓瑾. 火电厂生产成本分析与报价策略的研究[D]. 天津: 天津大学, 2007. ZHANG Xiaojin. The study of generator cost analysis and bidding strategies for power plant[D]. Tianjin: Tianjin University, 2007. Cited in this article [1]

[11]	CONEJO A J, NOGALES F J, ARROYO J M. Price-taker bidding strategy under price uncertainty[J]. IEEE Power Engineering Review, 2002, 22(9): 57. Cited in this article [1]

[12]

汤君博, 潘凯岩, 王富友, 等. 基于纳什议价法的多主体虚拟电厂优化调度及效用分配策略[J]. 太阳能学报, 2025, 46(5): 79-88.

TANG

Junbo

, PAN

Kaiyan

, WANG

Fuyou

, et al. Optimal scheduling and utility allocation strategy for multi-agent virtual power plants based on Nash bargaining method[J]. Acta Energiae Solaris Sinica, 2025, 46(5): 79-88.

Cited in this article [1]

[13]

于娣, 胡健, 张晓杰, 等. 电力P2P交易中的双轮竞价博弈模型[J]. 电力建设, 2023, 44(7): 21-32.

https://doi.org/10.12204/j.issn.1000-7229.2023.07.003

Abstract

点对点(peer-to-peer, P2P)交易是一种适合分布式电力产消者参与电力市场的新型交易方式。基于市场化视角并综合考虑产消者的经济效益和分布式清洁能源就地消纳情况,设计了基于双轮竞价博弈的电力P2P交易流程,建立了以经济效益为目标和以申报电量出清为目标的双轮竞价博弈模型。针对一个包含居民、办公、商业等不同类型电力产消者的社区进行算例分析,结果显示:相较于连续双边拍卖、单轮竞价博弈以及全额与电网(peer-to-grid, P2G)交易的方式,双轮竞价博弈方式下经济效益分别提升19.01%、28.78%、56.81%,分布式清洁能源就地消纳率分别提升10.51%、24.05%、85.10%。结果表明,采用合理的竞价方式和报价策略,可增加分布式电力产消者的收益,也可提高P2P交易效率,促进分布式清洁能源就地消纳,助力“双碳”目标的实现。

, HU

Jian

, ZHANG

Xiaojie

, et al. Double-round bidding game model for P2P electricity transactions[J]. Electric Power Construction, 2023, 44(7): 21-32.

https://doi.org/10.12204/j.issn.1000-7229.2023.07.003

Cited in this article [1] Abstract

P2P transactions are a new transaction mode suitable for distributed electricity prosumers participating in the electricity market. Considering the economic benefits to prosumers and the local consumption of distributed clean energy, a P2P electricity transaction process was developed based on a double-round bidding game, and a game model with the goals of economic benefit and clearing energy was established. An example analysis was conducted for a community with different types of distributed electricity prosumers such as residents, offices, and commerce. Compared with continuous bilateral auctions, one-round bidding games, and P2G, the results of P2P transactions with a double-round bidding game showed that economic benefits increased by 19.01%, 28.78%, and 56.81%, and the local consumption rate of distributed clean energy increased by 10.51%, 24.05%, and 85.10%, respectively. In conclusion, adopting a bidding method and strategy can increase the revenue of distributed electricity prosumers, improve the efficiency of P2P transactions, promote the local consumption of distributed clean energy, and help realize the “double carbon” goal.

[14]

王晛, 王胜彩, 张少华. 电-碳-绿证交易耦合下新能源发电商参与投标竞争的多市场博弈分析[J]. 电网技术, 2024, 48(10): 4125-4134.

WANG

Xian

, WANG

Shengcai

, ZHANG

Shaohua

. Game analysis of coupled electricity-carbon-green certificate markets with strategic bidding of renewable generators[J]. Power System Technology, 2024, 48(10): 4125-4134.

Cited in this article [1]

[15]

Y B

, YANG

Y N

, ZHANG

, et al. A Stackelberg game-based approach to load aggregator bidding strategies in electricity spot markets[J]. Journal of Energy Storage, 2024, 95: 112509.

https://doi.org/10.1016/j.est.2024.112509

https://linkinghub.elsevier.com/retrieve/pii/S2352152X24020954

Cited in this article [1]

[16]

ZHU

X X

, ZHAO

G Q

, LI

J H

, et al. Stackelberg game for shared energy storage and wind farm bilateral trading with multi-market participation[J]. Energy, 2025, 326: 136238.

https://doi.org/10.1016/j.energy.2025.136238

https://linkinghub.elsevier.com/retrieve/pii/S0360544225018808

Cited in this article [1]

[17]

董雷, 田爱忠, 于汀, 等. 基于混合整数半定规划的含分布式电源配电网无功优化[J]. 电力系统自动化, 2015, 39(21): 66-72, 125.

DONG

Lei

, TIAN

Aizhong

, YU

Ting

, et al. Reactive power optimization for distribution network with distributed generators based on mixed integer semi-definite programming[J]. Automation of Electric Power Systems, 2015, 39(21): 66-72, 125.

Cited in this article [1]

[18]	徐潇源, 王晗, 严正, 等. 能源转型背景下电力系统不确定性及应对方法综述[J]. 电力系统自动化, 2021, 45(16): 1-13. XU Xiaoyuan, WANG Han, YAN Zheng, et al. Overview of power system uncertainty and its solutions under energy transition[J]. Automation of Electric Power Systems, 2021, 45(16): 1-13. Cited in this article [1]

[19]

刘洪, 徐正阳, 葛少云, 等. 考虑储能调节的主动配电网有功-无功协调运行与电压控制[J]. 电力系统自动化, 2019, 43(11): 51-58.

LIU

Hong

, XU

Zhengyang

, GE

Shaoyun

, et al. Coordinated operation of active-reactive power and voltage control for active distribution network considering regulation of energy storage[J]. Automation of Electric Power Systems, 2019, 43(11): 51-58.

Cited in this article [1]

[20]

王守相, 李琦, 赵倩宇, 等. 计及源荷随机性的交直流配电网电压多目标优化改进粒子群算法[J]. 电力系统及其自动化学报, 2021, 33(12): 10-17.

WANG

Shouxiang

, LI

, ZHAO

Qianyu

, et al. Improved particle swarm optimization algorithm for multi-objective voltage optimization of AC/DC distribution network considering the randomness of source and loads[J]. Proceedings of the CSU-EPSA, 2021, 33(12): 10-17.

Cited in this article [1]

[21]	胡维昊, 曹迪, 黄琦, 等. 深度强化学习在配电网优化运行中的应用[J]. 电力系统自动化, 2023, 47(14): 174-191. HU Weihao, CAO Di, HUANG Qi, et al. Application of deep reinforcement learning in optimal operation of distribution network[J]. Automation of Electric Power Systems, 2023, 47(14): 174-191. Cited in this article [1]

[22]	胥栋, 李逸超, 李赟, 等. 基于深度强化学习的多能流楼宇低碳调度方法[J]. 浙江电力, 2024, 43(2): 126-136. XU Dong, LI Yichao, LI Yun, et al. A low-carbon scheduling method for multi-energy flow buildings based on deep reinforcement learning[J]. Zhejiang Electric Power, 2024, 43(2): 126-136. Cited in this article [1]

[23]

沈健, 宋智功. 基于深度学习的双臂系统协同控制综述[J/OL]. 控制工程, 2025: 1-13. (2025-09-09) [2025-10-10]. https://doi.org/10.14107/j.cnki.kzgc.20250283.

https://doi.org/10.14107/j.cnki.kzgc.20250283

SHEN

Jian

, SONG

Zhigong

. Review of coordinated control for dual-arm systems based on deep learning[J/OL]. Control Engineering of China, 2025: 1-13. (2025-09-09) [2025-10-10]. https://doi.org/10.14107/j.cnki.kzgc.20250283.

https://doi.org/10.14107/j.cnki.kzgc.20250283

Cited in this article [1]

[24]	方虹苏. 基于深度强化学习的智能汽车控制模型研究[J]. 自动化应用, 2025, 66(4): 59-62. FANG Hongsu. Research on intelligent automotive control model based on deep reinforcement learning[J]. Automation Application, 2025, 66(4): 59-62. Cited in this article [1]

[25]	韩冬, 黄微, 严正. 基于深度强化学习的电力市场虚拟投标策略[J]. 中国电机工程学报, 2022, 42(4): 1443-1454. HAN Dong, HUANG Wei, YAN Zheng. Deep reinforcement learning for virtual bidding in electricity markets[J]. Proceedings of the CSEE, 2022, 42(4): 1443-1454. Cited in this article [1]

[26]	李超英, 檀勤良. 基于智能体建模的新型电力系统下火电企业市场交易策略[J]. 中国电力, 2024, 57(2): 212-225. LI Chaoying, TAN Qinliang. Market trading strategy for thermal power enterprise in new power system based on agent modeling[J]. Electric Power, 2024, 57(2): 212-225. Cited in this article [1]

[27]	许丹, 胡晓静, 胡斐, 等. 基于深度强化学习的电力市场量价组合竞价策略[J]. 电网技术, 2024, 48(8): 3278-3286. XU Dan, HU Xiaojing, HU Fei, et al. Strategic bidding of price-quantity pairs in electricity market based on deep reinforcement learning[J]. Power System Technology, 2024, 48(8): 3278-3286. Cited in this article [1]

[28]

李钟平, 向月. 深度强化学习驱动的风储系统参与能量-调频市场竞价策略[J]. 电力工程技术, 2025, 44(3): 30-42.

Zhongping

, XIANG

Yue

. Deep reinforcement learning-driven bidding strategy for wind-storage systems in energy and frequency regulation markets[J]. Electric Power Engineering Technology, 2025, 44(3): 30-42.

Cited in this article [1]

[29]

ZHANG

, HU

W H

, CAO

, et al. Deep reinforcement learning-based approach for optimizing energy conversion in integrated electrical and heating system with renewable energy[J]. Energy Conversion and Management, 2019, 202: 112199.

https://doi.org/10.1016/j.enconman.2019.112199

https://linkinghub.elsevier.com/retrieve/pii/S0196890419312051

Cited in this article [1]

[30]

QIU

D W

, DONG

Z H

, RUAN

G C

, et al. Strategic retail pricing and demand bidding of retailers in electricity market: a data-driven chance-constrained programming[J]. Advances in Applied Energy, 2022, 7: 100100.

https://doi.org/10.1016/j.adapen.2022.100100

https://linkinghub.elsevier.com/retrieve/pii/S266679242200018X

Cited in this article [1]