PDF(4047 KB)
PDF(4047 KB)
PDF(4047 KB)
基于多智能体策略梯度的售电商购电报价及售电定价联合策略优化
Joint Bidding and Pricing Strategy Optimization for Electricity Retailers Based on Multi-agent Policy Gradient
【目的】在双侧竞价的电力现货市场中,售电商的主要业务是通过优化购售电策略以获得最大利润。【方法】针对售电商的购电报价及售电定价联合策略优化问题,首先建立了马尔可夫决策过程;其次,在此基础上提出了一种可以用于连续动作空间的改进型反事实基线多智能体策略梯度算法(modified counterfactual multi-agent deep reinforcement learning with soft actor-critic algorithm,mCOMA-SAC)并对联合策略优化问题进行了求解;此外,进一步对比分析了价格影响者(price-maker)售电商和价格接受者(price-taker)售电商在报价和定价联合决策上的差异,以及不同程度的线路阻塞对价格影响者售电商决策的影响。【结果】算例结果验证了mCOMA-SAC算法在学习性能、计算性能和优化性能上的优点。【结论】所提的mCOMA-SAC算法可以有效地优化售电商购电报价和售电定价联合决策,解决了价格影响者型售电商在双侧竞价电力现货市场中的策略性购售电行为模拟问题,为电力市场中售电商市场力的控制提供了方法支持。
[Objective] In the electricity spot market with two-sided bidding, the primary business of electricity retailers is to optimize their energy purchasing and selling strategies to maximize profits. [Methods] In this study, we first established a Markov decision process for the joint bidding and pricing strategy optimization problem of electricity retailers. Then, a modified counterfactual multi-agent deep reinforcement learning with soft actor-critic algorithm (mCOMA-SAC) is proposed to solve the joint optimization problem. Furthermore, the joint bidding and pricing strategy of price-maker and price-taker retailers are comparatively analyzed, and an in-depth analysis of the impact of line congestion on the joint bidding and pricing strategy is conducted. [Results] The numerical results demonstrate the advantages of the mCOMA-SAC algorithm in learning, computing, and optimization performance. [Conclusions] The proposed mCOMA-SAC algorithm can effectively optimize the joint bidding and pricing strategy of electricity retailers, solves the simulation problem of strategic power purchasing and selling behaviors of price-maker retailers in the dual-side bidding electricity spot market, and provides methodological support for the control of the market power of retailers in the electricity market.
电力市场 / 售电商 / 报价策略 / 定价策略 / 联合策略优化 / 多智能体强化学习
electricity market / electricity retailer / bidding strategy / pricing strategy / joint strategy optimization / multi-agent reinforcement learning
| [1] |
杨甲甲, 赵俊华, 文福拴, 等. 电力零售核心业务架构与购售电决策[J]. 电力系统自动化, 2017, 41(14): 10-18, 20-23.
|
| [2] |
詹祥澎, 杨军, 王昕妍, 等. 考虑实时市场联动的电力零售商鲁棒定价策略[J]. 电网技术, 2022, 46(6): 2141-2153.
|
| [3] |
|
| [4] |
李源, 蓝歆格, 尹纯亚, 等. 基于改进DHNN模型的售电公司信用评价[J]. 浙江电力, 2024, 43(1): 72-79.
|
| [5] |
舒征宇, 朱凯翔, 王灿, 等. 考虑碳交易的虚拟电厂日前电力市场竞价策略[J]. 电力工程技术, 2024, 43(5): 58-68, 149.
|
| [6] |
赵懿雯, 温家兴, 陈斐, 等. 可再生能源配额制下电力市场发电主体决策优化模型[J]. 电网与清洁能源, 2024, 40(1): 150-155, 162.
|
| [7] |
马光, 江伟, 李文朝, 等. 基于云边协同和区块链的分布式能源交易系统设计[J]. 电力工程技术, 2023, 42(4): 159-166.
|
| [8] |
吴华华, 张思, 甘雯, 等. 面向多时间尺度市场交易的电网企业代理购电决策模型[J]. 浙江电力, 2023, 42(9): 42-51.
|
| [9] |
何奇琳, 艾芊. 售电侧放开环境下含需求响应虚拟电厂的电力市场竞价策略[J]. 电力建设, 2019, 40(2): 1-10.
售电侧市场放开是我国新一轮电力体制改革的重点任务之一。文章考虑售电侧放开对需求侧资源参与电力市场的影响,提出了售电侧放开环境下计及需求响应的虚拟电厂(virtual power plant, VPP)竞价策略,建立了含虚拟电厂的电力市场竞价模型,在对虚拟电厂内部进行优化调度的基础上采用自适应学习方法对竞价模型进行求解,并以社会效益最大化为目标进行市场出清,最后选取算例对竞价策略的有效性进行验证。结论表明,售电侧放开环境下发电商与售电商的经济性有所提高且市场整体的用电需求得到调整,同时售电侧放开有利于需求侧资源参与市场竞争。
The opening of the retail market is one of the key tasks of the power system reform in China. In this paper, under retail power market deregulation, we consider the influence on demand-side resource participating in electric market, propose a bidding strategy of virtual power plant considering demand response, and set up a bidding model for the whole market including virtual power plant (VPP). On the basis of the optimization of the virtual power plant, we use the adaptive learning method to solve the model, and clear the market with the goal of maximizing the social benefit. Finally, an example is selected to verify the validity of the bidding strategy, and the results show that the economy of the generation companies and retailing suppliers is improved and the demand for electricity in the whole market is adjusted. What's more, retail power market deregulation is benefit to demand-side resources participating in market competition.
|
| [10] |
郭昆健, 高赐威, 林国营, 等. 现货市场环境下售电商激励型需求响应优化策略[J]. 电力系统自动化, 2020, 44(15): 28-35.
|
| [11] |
|
| [12] |
|
| [13] |
甘宇翔, 蒋传文, 白宏坤, 等. 市场环境下园区售电商的最优报价和运行优化[J]. 电网技术, 2018, 42(3): 707-715.
|
| [14] |
彭谦, 周晓洁, 杨睿, 等. 泛在电力物联网环境下综合能源型售电公司参与电力市场竞争的报价策略研究[J]. 电网技术, 2019, 43(12): 4337-4343.
|
| [15] |
王林炎, 张粒子, 张凡, 等. 售电公司购售电业务决策与风险评估[J]. 电力系统自动化, 2018, 42(1): 47-54, 143.
|
| [16] |
孔祥玉, 张禹森, 杨世海, 等. 市场机制下考虑风险的售电公司日前电价决策方法[J]. 电网技术, 2019, 43(3): 935-943.
|
| [17] |
李雅婷, 唐家俊, 张思, 等. 考虑多重不确定性因素的售电公司购售电决策模型[J]. 电力系统自动化, 2022, 46(7): 33-41.
|
| [18] |
徐弘升, 陆继翔, 杨志宏, 等. 基于深度强化学习的激励型需求响应决策优化模型[J]. 电力系统自动化, 2021, 45(14): 97-103.
|
| [19] |
薛溟枫, 毛晓波, 肖浩, 等. 基于改进深度Q网络算法的多园区综合能源系统能量管理方法[J]. 电力建设, 2022, 43(12): 83-93.
多园区综合能源系统可通过多能互补互济显著提升运行经济性,然而园区之间的复杂互动、多能耦合决策会给多园区综合能源系统的能量管理带来决策空间庞大、算法难以收敛等挑战性问题。为解决上述问题,提出了一种基于改进深度Q网络(modified deep Q network,MDQN)算法的多园区综合能源系统能量管理方法。首先,采用独立于园区的外部气象数据、历史互动功率数据,构建了基于长短期记忆(long short-term memory,LSTM)深度网络的各园区综合能源系统外部互动环境等值模型,降低了强化学习奖励函数的计算复杂度;其次,提出一种基于k优先采样策略的MDQN算法,用k-优先采样策略来代替ε贪心策略,克服了大规模动作空间中探索效率低下的问题;最后,在含3个园区综合能源系统的算例中进行验证,结果表明MDQN算法相比原DQN算法具有更好的收敛性与稳定性,同时可以提升园区经济效益达29.16%。
Multi-park integrated energy system can significantly improve the operation economy by complementing each other with multiple energy sources. However, the complex interactions between parks and multi-energy coupling decisions can bring challenging problems such as large decision space and difficult convergence of algorithms to the energy management of multi-park integrated energy system. To solve the above problems, an energy management method based on modified deep Q network (MDQN) algorithm for multi-park integrated energy systems is proposed. Firstly, the external meteorological data and historical interactive power data independent of the park are used to construct a long short-term memory (LSTM) deep network-based external interactive environmental equivalence model for each park integrated energy system, which reduces the computational complexity of the reinforcement learning reward function. Secondly, an improved DQN algorithm based on k-first sampling strategy is proposed to replace the greedy strategy with k-first sampling strategy to overcome the inefficiency of exploration in large-scale action spaces. Finally, the results are validated in an algorithm containing three integrated energy systems in the park, and show that the MDQN algorithm has better convergence and stability compared with the original DQN algorithm, while it can improve the economic efficiency of the park by 29.16%. |
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
戴彦, 王刘旺, 李媛, 等. 新一代人工智能在智能电网中的应用研究综述[J]. 电力建设, 2018, 39(10): 1-11.
智能电网是人工智能 (artificial intelligence, AI) 的重要应用领域之一, 以高级机器学习理论、大数据、云计算为主要代表的新一代人工智能 (new generation artificial intelligence, NGAI) 技术的进步和突破, 将会促进智能电网的发展。首先概述AI的主要方法, 并对NGAI的内涵、特点与技术体系进行论述。之后, 对NGAI在能源供应、电力系统安全与控制、运维与故障诊断、电力需求和电力市场等领域中的最新应用研究情况进行比较系统的综述。最后, 总结NGAI在智能电网中应用的关键问题, 提出人工智能在智能电网中的应用可分为三阶段实施的建议。
Smart grid is an important application field of artificial intelligence (AI). The progress and breakthrough of new generation artificial intelligence (NGAI) technologies, mainly represented by advanced machine learning, big data, and cloud computing, will provide incentives for the development of smart grids. Dominated methods of AI are first introduced, and the connotation, characteristics and technical system of NGAI briefly described. Next, the recent applications of NGAI in some fields including energy supply, power system security and control, operation maintenance and fault diagnosis, power demand and electricity market are surveyed in detail. Finally, some key issues concerning the applications of NGAI in smart grids are summarized, and a proposal is suggested that the applications of AI in smart grids be implemented through three stages. |
| [24] |
张继行, 张一, 王旭, 等. 基于多代理强化学习的多新型市场主体虚拟电厂博弈竞价及效益分配策略[J]. 电网技术, 2024, 48(5): 1980-1991.
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
贾乾罡, 陈思捷, 李亦言, 等. 有限信息环境下基于学习自动机的发电商竞价策略[J]. 电力系统自动化, 2021, 45(6): 133-139.
|
| [29] |
|
| [30] |
孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7): 1301-1312.
|
| [31] |
|
| [32] |
员江洋, 杨明, 刘宁宁, 等. 不完全信息下基于多代理深度确定策略梯度算法的发电商竞价策略[J]. 电网技术, 2022, 46(12): 4832-4844.
|
| [33] |
谢昕怡, 应黎明, 田书圣, 等. 基于MADDPG和智能合约的微电网交易决策优化[J]. 电力建设, 2022, 43(11): 142-150.
为解决微电网在传统集中化交易模式下面临的决策耗时长、信任成本高和隐私安全等问题,提出了基于多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient, MADDPG)算法与智能合约的微电网去中心化市场交易体系。首先,对微电网市场中多智能体进行划分后设计了适用于各主体参与分布式交易的微电网去中心化交易机制,以保障市场主体利益。其次,为实现交易确认阶段微电网市场主体的交易策略优化,采用MADDPG算法对各主体追求利益最大的竞价模型进行求解。最后,通过算例仿真验证了MADDPG算法在智能合约下微电网市场主体交易策略优化过程中的可行性和经济性。
In order to solve the problems of long decision-making time, high cost of trust and privacy security faced by microgrid in the traditional centralized trading, a decentralized market trading system based on multi-agent deep deterministic policy gradient (MADDPG) algorithm and smart contract is proposed for microgrid. Firstly, after dividing the multiple agents in the microgrid market, a decentralized transaction mechanism for microgrids that is suitable for all entities to participate in distributed transactions is designed to protect the interests of market entities. Secondly, in order to realize the optimization of the transaction strategy of the microgrid market entities in the transaction confirmation stage, the multi-agent deep deterministic policy gradient algorithm is used to solve the bidding model that each entity pursues the most benefits. Finally, the feasibility and economy of the MADDPG algorithm in the optimization process of the transaction strategy of the microgrid market entities under the smart contract is verified by example simulation. |
| [34] |
|
| [35] |
|
| [36] |
Reinforcement learning (RL) algorithms have been around for decades and employed to solve various sequential decision-making problems. These algorithms, however, have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This article addresses an important aspect of deep RL related to situations that require multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multiagent deep RL (MADRL) is presented, including nonstationarity, partial observability, continuous state and action spaces, multiagent training schemes, and multiagent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to the future development of more robust and highly useful multiagent learning methods for solving real-world problems.
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
AI小编
/
| 〈 |
|
〉 |