基于Spark的大电网广域时空序列分析平台构建

doi:10.3969/j.issn.1000－7229.2016.11.008

摘要/Abstract

摘要： 为了适应能源互联网发展趋势及日益复杂的运行环境，亟需依托大数据技术，提升能源互联网多源大数据的挖掘深度及应用效率。首先，针对大电网广域时空序列数据，阐述了Spark在分布式计算中的优势，阐明大数据平台建设目标，设计了基于Spark的电力大数据平台架构，并对平台各个层次进行详细的论述。其次，描述了Spark针对电网时空序列数据的处理过程。最后，在搭建的Spark和Hadoop实验环境基础上，对典型聚类算法进行性能对比测试，验证了Spark相对于Hadoop的MapReduce计算模型数据处理的优势，为下一步研究工作奠定了基础。

关键词: 能源互联网, Spark, 时空序列, 流计算, 聚类

Abstract: To address the energy internet trends and increasingly complex operating environment, we need to enhance the mining depth and utilization capability of energy internet multi-source data relying on big data technology. First, in the view of the wide-area spatiotemporal sequences data of large power grid, this paper expounds the Sparks advantages in distributed computing and the goal of big data platform, designs the big data platform architecture of power grid based on Spark, and describes each level of the platform in detail. Secondly, this paper describes the Sparks advantage in processing the spatiotemporal sequences data. Finally, on the basis of Spark and Hadoop experiment environment, this paper carries out typical clustering algorithm to compare the performance between Spark and Hadoop. The results verifies that Spark has a great advantage in data processing comparing with Hadoop MapReduce, which lays the foundation for the next step research.

Key words: energy internet, Spark, spatiotemporal sequences, streaming computing, cluster

中图分类号:

TM 73

袁宝超，刘道伟，刘丽平，王泽忠. 基于Spark的大电网广域时空序列分析平台构建[J]. 电力建设, 2016, 37(11): 48-.

YUAN Baochao, LIU Daowei, LIU Liping, WANG Zezhong. Platform Building for Wide-Area Spatiotemporal Sequences Analysis of Large-Scale Power Grid Based on Spark[J]. Electric Power Construction, 2016, 37(11): 48-.

参考文献

［1］孙宏斌, 郭庆来, 潘昭光, 等. 能源互联网:驱动力、评述与展望［J］. 电网技术, 2015, 39（11）: 3005-3013．

SUN Hong, GUO Qinglai, PAN Shaoguang, et al. Energy internet: driving force, review and outlook［J］. Power System Technology, 2015, 39（11）: 3005-3013.

［2］马钊, 周孝信, 尚宇炜, 等. 能源互联网概念、关键技术及发展模式探索［J］. 电网技术, 2015, 39（11）: 3014-3022.

MA Zhao, ZHOU Xiaoxin, SHANG Yuwei, et al. Exploring the concept, key technologies and development model of energy internet［J］. Power System Technology, 2015, 39（11）: 3014-3022.

［3］魏向向, 杨德昌, 叶斌. 能源互联网中虚拟电厂的运行模式及启示［J］. 电力建设, 2016, 37(4): 1-9.WEI Xiangxiang, YANG Dechang, YE Bin. Development path exploration of energy internet［J］. Electric Power Construction, 2016, 37(4): 1-9.

［4］王玮, 刘荫, 于展鹏, 等. 电力大数据环境下大数据中心架构体系设计［J］. 电力信息与通信技术, 2016，14(1)：1-6.

WANG Wei, LIU Yin, YU Zhanpeng, et al. System design of the big data center architecture in electric power big data environment［J］. Electric Power Information Technology, 2016，14(1)：1-6.

［5］朱朝阳, 王继业, 邓春宇. 电力大数据平台研究与设计［J］. 电力信息与通信技术, 2015, 13(6): 1-7.

ZHU Chaoyang, WANG Jiye, DENG Chunyu. Research and design of electric power big data platform［J］. Electric Power Information Technology, 2015, 13(6): 1-7.

［6］李亚楼, 张星, 李勇杰, 等. 交直流混联大电网仿真技术现状及面临挑战［J］. 电力建设, 2015, 36(12): 1-8.

LI Yalou, ZHANG Xing, LI Yongjie, et al. Present situation and challenges of AC /DC hybrid large-scale power grid simulation technology［J］. Electric Power Construction, 2015, 36(12): 1-8.

［7］印永华, 郭剑波, 赵建军, 等. 美加“8. 14”大停电事故初步分析以及应吸取的教训［J］. 电网技术, 2003, 27(10): 8-11.

YIN Yonghua, GUO Jianbo, ZHAO Jianjun, et al. Preliminary analysis of large scale blackout in interconnected north America power grid on august 14 and lessons to be drawn［J］. Power System Technology, 2003, 27(10): 8-11.

［8］薛禹胜.  时空协调的大停电防御框架（一）从孤立防线到综合防御［J］. 电力系统自动化, 2006, 30(1):8-16.

XUE Yusheng. Space-time cooperative framework for defending blackouts, part I: from isolated defense lines to coordinated defending［J］. Automation of Electric Power Systems, 2006, 30(1): 8-16.

［9］薛禹胜. 时空协调的大停电防御框架（二）广域信息、在线量化分析和自适应优化控制［J］. 电力系统自动化, 2006, 30(2):1-10.

XUE Yusheng. Space-time cooperative framework for defending blackouts, part II : reliable information, quantitative analyses and adaptive controls［J］. Automation of Electric Power Systems, 2006, 30(2): 1-10.

［10］刘道伟, 张东霞, 孙华东, 等. 时空大数据环境下的大电网稳定态势量化评估与自适应防控体系构建［J］. 中国电机工程学报, 2015, 35(2):268-276.

LIU Daowei, ZHANG Dongxia, SUN Huadong, et al. Construction of stability situation quantitative assessment and adaptive control system for large-scale power grid in the spatio-temporal big data environment［J］. Proceedings of the CSEE, 2015, 35(2): 268-276.

［11］胡学浩. 智能电网——未来电网的发展态势［J］. 电网技术, 2009, 33(14):1-5.

HU Xuehao. Smart grid—A development trend of future power grid［J］. Power System Technology, 2009, 33(14): 1-5.

［12］宋亚奇, 周国亮, 朱永利. 智能电网大数据处理技术现状与挑战［J］. 电网技术, 2013，37(4): 927-935.

SONG Yaqi, ZHOU Guoliang, ZHU Yongli. Present status and challenges of big data processing in smart grid［J］. Power System Technology, 2013, 37(4): 927-935.

［13］彭小圣, 邓迪元, 程时杰, 等. 面向智能电网应用的电力大数据关键技术［J］. 中国电机工程学报, 2015，35(3): 503-511.

PENG Xiaosheng, DENG Diyuan, CHENG Shijie, et al. Key technologies of electric power big data and its application prospects in smart grid［J］. Proceedings of the CSEE, 2015，35(3): 503-511.

［14］赵春晖, 吴志力, 姜欣, 等. 跨平台电网规划数据融合与存储模式［J］.电力建设, 2015, 36(3): 119-122.ZHAO Chunhui, WU Zhili, JIANG Xin, et al. Cross-Platform data fusion and storage pattern of power grid planning［J］. Electric Power Construction, 2015, 36(3): 119-122.

［15］马天男, 牛东晓, 黄雅莉, 等. 基于Spark平台和多变量L_2-Boosting回归模型的分布式能源系统短期负荷预测［J］. 电网技术, 2016, 40（6）: 1642-1649.MA Tiannan, NIU Dongxiao, HUANG Yali, et al. Short-term load forecasting for distributed energy system based on Spark platform and multi-variable L2-boosting regression model［J］. Power System Technology, 2016, 40（6）: 1642-1649.

［16］刘成, 牛锐, 范贺明, 等. 基于Spark环境变压器故障并行诊断［J］.电力科学与工程, 2016,32（6）: 32-37.

LIU Cheng, NIU Rui, FAN Heming, et al. Transformer fault diagnosis in parallel based on the Spark platform［J］. Electric Power Science and Engineering, 2016,32（6）: 32-37.

［17］王保义, 王冬阳, 张少敏. 基于Spark和IPPSO_LSSVM的短期分布式电力负荷预测算法［J］. 电力自动化设备, 2016, 36（1）: 117-122. WANG Baoyi, WANG Dongyang, ZHANG Shaomin. Distributed short-term load forecasting algorithm based on Spark and IPPSO_LSSVM［J］. Electric Power Automation Equipment, 2016, 36（1）: 117-122.

［18］孟建良, 刘德超. 一种基于Spark和聚类分析的辨识电力系统不良数据新方法［J］. 电力系统保护与控制, 2016, 44（3）: 85-91.MENG Jianliang, LIU Dechao. A new method for identifying bad data of power system based on Spark and clustering analysis［J］. Power System Protection and Control, 2016, 44（3）: 85-91.

［19］XIN R. Spark officially sets a new record in large-scale sorting ［EB/OL］. (2014-11-05)［2016-07-05］.https://databricks. com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html

［20］KREPS J，NARKHEDE N，KAFKA R J: A distributed messaging system for log processing［C］//Proceedings of the NetDB, 2011:1-7.

［21］KALA KARUN A, CHITHARANJAN K. A review on hadoop—HDFS infrastructure extensions［C］// Information & Communication Technologies (ICT), 2013: 132-137.

［22］ZHANG H, CHEN G, OOI B C, et al. In-memory big data management and processing: A survey［J］. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7): 1920-1948.

［23］刘兴杰, 岑添云, 郑文书, 等. 基于模糊粗糙集与改进聚类的神经网络风速预测［J］. 中国电机工程学报, 2014, 34（19）:3162-3169. LIU Xingjie，CEN Tianyun, ZHENG Wenshu, et al. Neural network wind speed prediction based on fuzzy rough set and improved clustering ［J］. Proceedings of the CSEE, 2014, 34（19）: 3162-3169.

［24］郭昆亚, 熊雄, 金鹏, 等. 基于模糊聚类-量子粒子群算法的用电特性识别［J］. 电力建设, 2015, 36(8): 84-88.GUO Kunya, XIONG Xiong, JIN Peng, et al. Electricity characteristic recognition study based on fuzzy clustering-quantum particle swarm algorithm［J］. Electric Power Construction, 2015, 36(8): 84-88.

[1]	庄卫金, 张鸿, 方国权, 陈中. 基于设备运行状态挖掘的非侵入式负荷分解方法[J]. 电力建设, 2020, 41(8): 9-16.
[2]	艾欣，杨子豪，胡寰宇，王智冬，彭冬，赵朗. 基于改进k-means算法的VPP负荷曲线聚类方法及应用[J]. 电力建设, 2020, 41(5): 28-36.
[3]	史晨豪，唐忠，魏敏捷，李征南，陈寒. 基于数据驱动的配电网光伏双层优化调控策略[J]. 电力建设, 2020, 41(3): 62-70.
[4]	荆江平，吴奕，胡伟，陆晓，王文学，卫志农，孙国强. 电-热互联综合能源系统潮流计算的数值方法[J]. 电力建设, 2020, 41(2): 58-66.
[5]	尹渠凯，米增强，贾雨龙，范辉. 基于改进K-means聚类的电力市场下分布式储能系统经济性调控模型[J]. 电力建设, 2019, 40(5): 20-27.
[6]	陈宁，齐磊，崔翔，马江江. 柔性直流电网单极接地短路电流计算方法[J]. 电力建设, 2019, 40(4): 119-127.
[7]	徐杉杉，朱俊澎，袁越，吴涵. 基于马尔可夫模型的光伏出力聚类与模拟[J]. 电力建设, 2019, 40(2): 54-62.
[8]	蔡鹏飞，杨秀，李泰杰，方陈，张勇. 基于改进聚类融合的办公型建筑用电行为分析[J]. 电力建设, 2019, 40(1): 60-67.
[9]	周冰钰，刘博，王丹，兰宇，马喜然，孙冬冬，霍秋屹. 基于自组织中心K-means算法的用户互动用电行为聚类分析[J]. 电力建设, 2019, 40(1): 68-76.
[10]	杨智宇，刘俊勇，刘友波，温丽丽 . 基于时序运行数据的输电网线路重要度评估[J]. 电力建设, 2019, 40(1): 77-85.
[11]	张杭，丁晓群，陈光宇，季日华. 基于场景时域概率的主动配电网全协调无功优化[J]. 电力建设, 2018, 39(8): 53-58.
[12]	王聪1，徐晓贤1，沙广林2，段青2，郭靖3，张岚1 . 计及灵活配电单元的交直流混合配电系统潮流计算方法 [J]. 电力建设, 2018, 39(5): 56-.
[13]	黎静华，黄玉金，张鹏. 综合能源系统多能流潮流计算模型与方法综述[J]. 电力建设, 2018, 39(3): 1-.
[14]	薛友，李杨，高滢，文福拴，王珂，黄裕春. 计及风电出力随机特性的电-气综合能源系统随机优化[J]. 电力建设, 2018, 39(12): 2-12.
[15]	肖白，郭蓓，姜卓，施永刚，焦明曦. 基于负荷点聚类分区的配电网网架规划方法[J]. 电力建设, 2018, 39(11): 85-95.