• CSCD核心库收录期刊
  • 中文核心期刊
  • 中国科技核心期刊

电力建设 ›› 2017, Vol. 38 ›› Issue (10): 84-.doi: 10.3969/j.issn.1000-7229.2017.10.012

• 智能电网 • 上一篇    下一篇

 基于Spark平台和并行随机森林回归算法的短期电力负荷预测

 刘琪琛1, 雷景生2, 郝珈玮2,黄燕刚1,李强1,罗海波1   

  1.  (1.国网四川省电力公司眉山供电公司, 四川省眉山市620010分号.
     2.上海电力学院计算机科学与技术学院, 上海市200090)
     
  • 出版日期:2017-10-01
  • 作者简介:刘琪琛(1987),男,本科,工程师,主要研究方向为智能电网、电力大数据和电力系统调度自动化技术分号 雷景生(1966),男,博士,教授,主要研究方向为智能电网、电力大数据和无线传感网络分号 郝珈玮(1990),男,硕士,通信作者,主要从事电力大数据和电力监测无线传感网络等方面的研究工作分号 黄燕刚(1974),男,本科,助理工程师,主要研究方向为电力大数据和电力系统调度自动化技术分号 李强(1980),男,本科,技师,主要研究方向为智能电网、电力大数据和电力系统调度自动化技术分号 罗海波(1986),男,本科,工程师,主要研究方向为智能电网、电力大数据和电力系统调度自动化技术。
  • 基金资助:
     国家自然科学基金项目(61472236,61672337);国网眉山供电公司雄鹰创新攻关团队项目(基于调度技术支持系统的大数据分析与应用)

 Short-Term Power Load Forecasting Based on Spark Platform and Parallel Random Forest Regression Algorithm Model

 LIU Qichen1, LEI Jingsheng2, HAO Jiawei2, HUANG Yangang1, LI Qiang1, LUO Haibo1

 
  

  1.  (1. Meishan Power Supply Company, State Grid Sichuan Electric Power Company, Meishan 620010, Sichuan Province, China分号.
    2. School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China)
     
  • Online:2017-10-01
  • Supported by:
     Project supported by National Natural Science Foundation of China(61472236,61672337 )

摘要:  摘要:随着智能电网、全球能源互联网的建设与相关技术的发展,现代电力系统中电力大数据的格局已经形成,如何对高维海量数据进行深度挖掘以实现数据的充分利用,成为当前电力工作者们关心的问题。该文针对电力大数据环境下高精度和实时性的负荷预测展开了研究,提出了基于Spark平台和并行随机森林回归算法(Spark platform and parallel random forest regression, SP-RFR)的短期电力负荷预测方法,通过3次弹性分布式数据集(resilient distributed datasets, RDD)转换实现单机随机森林算法的并行化改进,并在Spark分布式集群环境下实现部署。结合某区域实际电力负荷数据设计试验,进行模型训练和回归预测,通过试验证明,对同等的数据集,基于Spark平台的并行随机森林回归算法预测精度高于单机负荷预测算法;并行随机森林算法受离群数据干扰较小,且随着数据集的增大,并行随机森林算法表现出良好的鲁棒性;与单机算法在运行时间上相比,随着数据集的增大,基于分布式集群的方法优势明显。该文提出的方法能够有效地在分布式环境中进行电力负荷预测,为负荷预测提供了一种新思路。

 

关键词:  , 电力大数据, 分布式计算, 并行随机森林回归算法, Spark平台, 短期电力负荷预测

Abstract:  ABSTRACT: With the development of smart grid, global energy Internet and related technologies, the structure of power big data is already formed. How to make full use of the high-dimensional massive data through data mining to make full use of data has aroused widespread concern of power workers. Aiming at the high precision and real-time load forecasting with the background of power big data, this paper proposes the short-term power load forecasting based on Spark platform and parallel random forest regression (SP-RFR) algorithm. The parallelization improvement of single machine random forest algorithm is realized by three transforms of resilient distributed datasets(RDD), and can be deployed on a Spark distributed cluster. Experiments are designed by using actual power load data of a transformer substation, and model training and regression prediction are implemented. The conclusions are as follows, for the same testing data set, the short-term power load forecasting method based on SP-RFR model is superior to single machine regression forecasting model; SP-RFR model is less disturbed by outlier data, and SP-RFR model has good robustness with the increase of data set; compared with the single machine model, with the increase of the data set, the SP-RFR, which is based on the distributed cluster, has obvious advantages. The proposed method can effectively forecast power load in distributed background, which can provide a new idea for power load forecasting.

 

Key words:  ,  power big data, distributed computing, parallel random forest regression algorithm, Spark platform, short-term power load forecasting.

中图分类号: