• CSCD核心库收录期刊
  • 中文核心期刊
  • 中国科技核心期刊

Electric Power Construction ›› 2017, Vol. 38 ›› Issue (10): 84-.doi: 10.3969/j.issn.1000-7229.2017.10.012

Previous Articles     Next Articles

 Short-Term Power Load Forecasting Based on Spark Platform and Parallel Random Forest Regression Algorithm Model

 LIU Qichen1, LEI Jingsheng2, HAO Jiawei2, HUANG Yangang1, LI Qiang1, LUO Haibo1

 
  

  1.  (1. Meishan Power Supply Company, State Grid Sichuan Electric Power Company, Meishan 620010, Sichuan Province, China分号.
    2. School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China)
     
  • Online:2017-10-01
  • Supported by:
     Project supported by National Natural Science Foundation of China(61472236,61672337 )

Abstract:  ABSTRACT: With the development of smart grid, global energy Internet and related technologies, the structure of power big data is already formed. How to make full use of the high-dimensional massive data through data mining to make full use of data has aroused widespread concern of power workers. Aiming at the high precision and real-time load forecasting with the background of power big data, this paper proposes the short-term power load forecasting based on Spark platform and parallel random forest regression (SP-RFR) algorithm. The parallelization improvement of single machine random forest algorithm is realized by three transforms of resilient distributed datasets(RDD), and can be deployed on a Spark distributed cluster. Experiments are designed by using actual power load data of a transformer substation, and model training and regression prediction are implemented. The conclusions are as follows, for the same testing data set, the short-term power load forecasting method based on SP-RFR model is superior to single machine regression forecasting model; SP-RFR model is less disturbed by outlier data, and SP-RFR model has good robustness with the increase of data set; compared with the single machine model, with the increase of the data set, the SP-RFR, which is based on the distributed cluster, has obvious advantages. The proposed method can effectively forecast power load in distributed background, which can provide a new idea for power load forecasting.

 

Key words:  ,  power big data, distributed computing, parallel random forest regression algorithm, Spark platform, short-term power load forecasting.

CLC Number: