제라르도 온도미하
(Gerardo Ondo Micha)
1iD
김철환
(Chul-Hwan Kim)
†iD
-
(Dept. of Electrical and Computer Engineering Sungkyunkwan University, Korea.)
Copyright © The Korean Institute of Electrical Engineers(KIEE)
Key words
Distribution energy sources, ensemble regression, intermittency of renewable energy. power forecasting, Support Vector Machine.
1. Introduction
Global warming has become a modern phenomenon as a result of the widespread use of
fossil fuels to produce energy. As a result, renewable energy resources could be a
feasible solution to this issue in terms of reducing carbon dioxide (CO2) emissions
and preserving pollution levels (1). Renewable energy sources such as solar, wind, hydro, tidal, geothermal, and biofuels
are all intriguing. The ecosystem will not be harmed by these pollution-free renewable
energy sources (2). Many scientists and researchers have looked into and determined the potential of
renewable energy sources including solar, wind, and hydropower (3).
Because of its clean, cost-free, and plentiful energy benefits, solar energy is one
of the most promising energy sources (4). As a result, solar power has a strong demand for power generation as part of efforts
to address these environmental problems. However, because of various environmental
factors such as ambient temperature, solar radiation, shadows, humidity, and wind
speed, the output power of photovoltaic (PV) panels is unpredictable and spontaneous.
These are some of the difficulties that grid operators must overcome to effectively
operate the power supply system (5). To address these issues, several strategies for balancing electric power consumption
and generation have been created. One of the techniques for improving the operating
reliability of power systems is to forecast demand on loads in the short term (6).
Solar power forecasting is important for energy trading companies and power network
dispatching centers to make accurate decisions on critical issues such as power system
scheduling and operational control (7). Furthermore, accurate solar power forecasting increases the overall power system's
efficiency and power quality (8). Solar power forecasting is divided into two categories: direct and indirect forecasting.
Solar power data is predicted by direct forecasting as the model performance. Indirect
forecasting, on the other hand, produces the predicted values of solar radiation.
As a result, PV output models are using forecasted solar radiation values to calculate
solar power generation (9).
With recent advancements in Artificial Intelligence (AI)/Machine Learning (ML), load
forecasting can be performed considering the weather and atmospheric conditions yielding
higher accuracy as compared to conventional methods (10). In (11), individual forecasting methods are proved to have limited performance, low forecasting
accuracy, and high error. It is, therefore, necessary to develop combined algorithms
that would yield more robust performance and increase the forecasting accuracy of
models. This is achieved by using ensemble learning, where individual models serve
as weak or base learners and their predictions are combined in a more accurate predictive
model for classification or regression problems (12). Various ensemble learning algorithms are available in the literature, there are
three types of ensemble learning depending on the way base models or learners are
combined: bagging, boosting, and stacking (13). Different authors have applied the three algorithms separately to predict PV power
using time series data (14,15-16). Authors in (17) present a joint bagged-boosted ANN, the proposed ensemble model produces higher accuracy
prediction of short-term electricity as compared to bagged ANN and boosted ANN. Separate
STACK combinations were compared in (18), (19) and (20) to forecast solar energy and more accurate models were obtained. However, to the
best of our knowledge, the proposed algorithms do not consider one single Bagged-
Boosted STACK with a single meta-learner to forecast future energy values.
This paper proposes a Bagged-Boosted STACK ensemble learning model with SVRL and seven
base learners are used: Elastic Net Regression, Random Forest, Linear Regression,
Lasso Regression, AdaBoost, GBoost (Gradient boost), and XGBoost (Extreme Gradient
Boost). Corr and PCA were used to pre-process and reduce time series data variance
and over-fit.
Bagging algorithms tend to decrease model variance whilst boosting focuses on reducing
bias error (21), contributions of this work aim at solving the variance-bias tradeoff using the
STACK combination of both to filter out the generalizing biases and variances. To
our best knowledge, our combination of base learner models as well as the high number
of based learners used in our STACK has rarely been used to forecast PV output power.
The model proposed in this paper is flexible and can be used with other methods and
different base and super-learners, the scheme is compared with bagging, boosting and
bagging-boosting algorithms to prove the superiority of the proposed model.
2. The Proposed Intelligent PV Power Forecasting Model
The proposed model is a hybrid combination of seven different weak learners into one
final SVRL meta-predictor to forecast the output power of a PV system. Improved accuracy
of the forecasting model is achieved by combing bagging and boosting algorithms in
level 0 of the STACK, thus reducing variance and bias errors.
2.1 Data Collection
The time series solar data of Gyeongnam, South Korea from January 2017 to June 2020
are obtained from a 350KW 3rd PV power plant. The dataset is comprised of 7 features
including UNIX date and time, temperature, wind direction, wind speed, rainfall, humidity
and PV output power. We carefully reviewed and the missing data were filled using
Bayesian Ridge Regression. In figure 1 the PV output profile is shown, (KW),from the year 2017 to 2020.
Fig. 1. Photovoltaic energy production from 2017 to 2020.
2.2 Feature Selection
The correlation between different data features is shown in figure 2. The correlation map shows the wind speed as the feature with the highest correlation
value (0.46) with the PV output power. A high regression coefficient is also depicted
for temperature (0.31) and the year, the rest of the features have a negative regression
coefficient, which means a lower correlation to PV energy production. Based on feature
statistics in Table 1 and the correlation map, five features were selected: wind speed, temperature, year,
number of days of the week, and wind direction. The pre-processed data is normalized
as in (1) to the scale of [-1, 1] allowing the model to converge faster and avoiding
very large weights to be assigned to features with larger scores in previous years
during training. PCA feature selection was implemented to further reduce our data
dimension into two principal uncorrelated components maintaining the most possible
information from the previous dataset. As shown in figure 2, the wind speed and temperature has the highest positive correlation with the PV
output, this means that if these features increase, the PV output power also increases,
if they decrease, the PV output power also decreases.
Fig. 2. Feature correlation heat map.
Table 1. The proposed STACK model framework.
Feature
|
Mean
|
Standard
Deviation
|
Minimum
Value
|
Maximum
Value
|
Temperature
|
13.330018
|
9.547295
|
-13.5
|
35.1
|
Wind Direction
|
165.690331
|
124.355640
|
0.0
|
360.0
|
Wind Speed
|
2.348275
|
1.835690
|
0.0
|
13.6
|
Rainfall
|
0.164399
|
1.246811
|
0.0
|
68.5
|
Humidity
|
69.847010
|
21.563524
|
1.1
|
99.9
|
Year
|
2018.285043
|
1.029817
|
2017
|
2020
|
Month
|
6.09475
|
3.424773
|
1.0
|
12.0
|
Day
|
15.714174
|
8.792300
|
1.0
|
31.0
|
Day of Week number
|
2.998434
|
2.001989
|
0.0
|
6.0
|
2.3 The Framework
The framework structure in Figure 3 shows the steps developed in this paper to implement the applied methodology. The
first step was time-series raw data pre-processing and feature statistics were analyzed,
followed by data split into training and test, 80% and 20% respectively. The next
step was data scaling and feature selection using CORR and PCA. The next step was
the training of base models in level 0 of the STACK, predictions made by bagging and
boosting models were combined and used as input for the SVRL meta-learner in level
1. The last step was the evaluation of the trained Stacking model using a testing
dataset and different metrics were used to compare the proposed model with bagging,
boosting, and bagging- boosting models separately.
Fig. 3. The proposed STACK model framework.
The proposed model is a hybrid combination of seven different weak learners into one
final SVRL meta-predictor to forecast the output power of a PV system. Improved accuracy
of the forecasting model is achieved by combing bagging and boosting algorithms in
level 0 of the STACK. This is achieved by averaging the predictions made by each and
every weak learner, thus reducing variance and bias errors.
A support vector machine is a machine learning method that turns the problem into
linear by transformations of the original space to higher-dimensional spaces employing
a kernel $K\left(X_{n},\: X_{n^{1}}\right)$ $K\left(X_{n},\: X_{n^{1}}\right)$.
In regression, the error is minimized by eliminating the penalty around $\vec{\alpha}$
± ɛ interval. So that models do not fall into over- fitting, a certain error is admitted
in the data, which is marked by the hyper-parameter C.
By combining bagging and boosting, the total forecasting error will decrease significantly.
The total error of a model can be decomposed as bias + variance + error. In bagging,
models with very little bias but a high variance are used, adding them reduces the
variance without just inflating the bias. In boosting, models with a very little variance
but high bias are used, adjusting the models sequentially reduces the bias. Therefore,
each of the strategies reduces a part of the total error of the STACK.
In bagging, each model is different from the rest because each one is trained with
a different sample obtained by bootstrapping. In boosting, the models have adjusted
sequentially and the importance (weight) of the observations changes with each iteration,
leading to different adjustments.
2.4 Algorithm
Given $M$ models in level 0 of the STACK, $h$ base regressors, $h^{new}$ meta-regressor
and a training dataset, $D$:
·For $D =(x_{i}-y_{i})|x_{i}\epsilon X,\: y_{i}\epsilon Y$
· For $t = 1$to $T$, learn base regressors for bagging and boosting based on $D$.
· Construct a new dataset from $D$.
· For $i = 1$ to $m$, construct a new dataset {$x_{i}^{new},\:y_{i}$};
· Where {$x_{i}^{new}=h_{i}(x_{i}) {for}j = 1$ to $T$$$};
· Learn the meta-classifier $h_{new}$ based on a new dataset {$x_{i}^{new},\:y_{i}$};
· Return $H_{(x)}=h_{new}(h_{1(x)},\:h_{2(x)},\:...,\:h_{T(x)})$
A different set of Training data $D=(x_{i}-y_{i})$ was collected and lineal Kernel
was used. The correlation matrix was formed as in (1).
Where $\varepsilon$ represents the violation concept and the correlation vector $K$
is used to compute the concentration coefficient, $\vec{\alpha}$. $\vec{y}$ contains
all the values corresponding to $D$.
$\vec{\alpha}$ is used to create the estimator for our model and maintain the forecasting
error of the STACK model below the threshold (23).
3. Performance Evaluation
Given that $y$ is the actual value, $\widehat{y}$ is the predicted value, $n$ is the
number of data samples and represents the variance, the following metrics were used
to evaluate our model performance:
Explained Variance Score (EVS): is used to measure the variability between $y$ and
$\widehat{y}$, the ideal value of EVS is 1:
Mean Absolute Error (MAE): measures the absolute error between the predicted value
and the actual value. It shows how big the forecast error is on average.
Mean Squared Error (MSE): measures how close data points are located from the fitted
line.
Root Mean Squared Error (RMSE): measures how far data points are located from the
regression line.
Determination Coefficient $(R^{2})$: $R^{2}$ is the percentage of variation of the
predicted value that explains its relationship with one or more predictor variables.
Generally, the higher the $R^{2}$, the better the model fits the given data.
4. Results and Discussion
This section shows the comparison of performance among the proposed Bagging-Boosting
STACK SVRL, bagging, boosting, and a combination of bagging-boosting models using
Elastic Net Regression, Random Forest, Linear Regression, Lasso Regression, AdaBoost,
Gboost, and XGBoost as base learners of the STACK, the simulations and coding were
performed using Python. On the plot, the blue dotted line represents the performance
of the training set made of 80% of the original data set, whilst the green line represents
the model performance on the 20% testing dataset. In figure 4, the bagging model shows a reduced bias and variance as compared to boosting in figure 5. The lowest bias error is achieved by combining bagging and boosting algorithms together
as shown in figure 6, whilst the proposed STACK model in figure 7 shows the lowest variance and bias errors.
Fig. 4. Bagging Model Performance Evaluation.
Fig. 5. Boosting Model Performance Evaluation.
Fig. 6. Bagging-Boosting Model Performance Evaluation
Fig. 7. Bagging-Boosting STACK Model Performance
In figure 8, the different weak learners used in level 0 of our STACK model were compared by
five different metrics and observations show the most robust performance for the Random
Forest model, with a value of 0.25% for both and EVS. Figure 9 shows the results of the comparison among selected algorithms. The proposed bagging-boosting
STACK shows better overall performance by solving the bias-variance tradeoff. The
metrics used in assessing the four models show the lowest forecasting error for the
STACK in comparison with the other algorithms, hence yielding the best forecasting
ability.
Fig. 8. Comparison of Error Metrics among the base learners
Fig. 9. Comparison of Error Metrics with Existing Models
5. Conclusion
In this work, we propose a bagging-boosting STACK model with different algorithms
used in solar power prediction. The proposed STACK uses an SVRL as a meta-learner
to forecast PV output and seven different weak learners are used to provide input
prediction to the meta-learner. The proposed STACK outperforms the predictions made
by bagging, boosting, and bagging-boosting models separately. By using the proposed
model, the tradeoff between variance and bias is significantly reduced. Bagging reduces
the variance of the weak learners whilst boosting reduces their bias, by combining
both algorithms into one STACK model, the generalization and forecasting errors are
reduced. The developed methodology showed that the prediction error of PV output power
can be reduced, however more data is required to training the models as shown by the
results. The proposed STACK model can be improved through testing different base learners
and meta-learners, but also by increasing the number of layers of the STACK.
Acknowledgements
This work has been supported by the National Research Foundation of Korea (NRF) grant
funded by the Korean government (MSIP) (No. 2021R1A2B5B03086257).
References
M.K. Behera, I. Majumder, N. Nayak, 2018, Solar photovoltaic power forecasting using
optimized modified extreme learning machine technique, Engineering Science and Technology,
Vol. an international journal, pp. 21:428-438
P. Dawan, K. Sriprapha, S. Kittisontirak, T. Boonraksa, N. Junhuathon, W. Titiroongruang,
S. Niemcharoen, 2020, Comparison of power output forecasting on the photovoltaic system
using adaptive neuro-fuzzy inference systems and particle swarm optimization-artificial
neural network model, Energies, Vol. 13:351
Y. Zhang, J. Ren, Y. Pu, P. Wang, 2020, Solar energy potential assessment: A framework
to integrate geographic, technological, and economic indices for a potential analysis,
Renewable Energy, Vol. 149, pp. 577-586
Y.K. Semero, J. Zhang, D. Zheng, 2018, Pv power forecasting using an integrated ga-pso-anfis
approach and gaussian process regression based feature selection strategy, CSEE Journal
of Power and Energy Systems. 4:210-218, pp. 4:210-218
M. Zamo, O. Mestre, P. Arbogast, 2014, A benchmark of statistical regression methods
for short-term forecasting of photovoltaic electricity production, part i: Deterministic
forecast of hourly production, Solar Energy., pp. 105:792-803
F. Rodríguez, A. Fleetwood, A. Galarza, L. Fontán, 2018, Predicting solar energy generation
through artificial neural networks using weather forecasts for microgrid control,
Renewable Energy, pp. 126:855-864
A.T. Eseye, J. Zhang, D. Zheng, 2018, Short-term photovoltaic solar power forecasting
using a hybrid wavelet- pso-svm model based on scada and meteorological information.,
Renewable Energy., pp. 118:357-367
J. Shi, W.-J. Lee, Y. Liu, Y. Yang, P. Wang, 2012, Forecasting power output of photovoltaic
systems based on weather classification and support vector machines, IEEE Transactions
on Industry Applications, pp. 48:1064-1069
Huang, C., L. Cao, N. Peng, S. Li, J. Zhang, L. Wang, X. Luo, J.-H. Wang, 2018,
Day-ahead forecasting of hourly photovoltaic power based on robust multilayer perception,
Sustainability, Vol. 10:4863
S. Kittisontirak, P. Dawan, N. Atiwongsangthong, W. Titiroongruang, P. Chinnavornrungsee,
A. Hongsingthong, K. Sriprapha, P. Manosukritkul, 2017, A novel power output model
for photovoltaic system
S.M. Jung, S. Park, S.W. Jung, E Hwang, 2020, Monthly Electric Load Forecasting Using
Transfer Learning for Smart Cities, Sustainability, Vol. 12, No. 16, pp. 6364
C.E. Borges, Y.K. Penya, I. Fernandez, 2012, Evaluating combined load forecasting
in large power systems and smart grids, IEEE Transactions on Industrial Informatics,
Vol. 9, No. 3, pp. 1570-1577
M. Leutbecher, T. N Palmer, Ensemble forecasting, Journal of computational physics,
Vol. 227, No. 7, pp. 3515-3539
Zenko, B., Todorovski, L., Dzeroski, S, November 2001, A comparison of stacking
with meta decision trees to bagging, boosting, and stacking with other methods., In
Proceedings 2001 IEEE International Conference on Data Mining, pp. 669-670
W. El-Baz, P. Tzscheutschler, U Wagner, 2018, Day-ahead probabilistic PV generation
forecast for buildings energy management systems, Solar Energy, Vol. 171, pp. 478-490
C. Persson, P. Bacher, T. Shiga, H Madsen, 2017, Multi- site solar power forecasting
using gradient boosted regression trees, Solar Energy, Vol. 150, pp. 423-436
H. Zhou, Y. Zhang, L. Yang, Q Liu, October 2018, Short-term photovoltaic power forecasting
based on Stacking-SVM., In 2018 9th International Conference on Information Technology
in Medicine and Education (ITME), pp. 994-998
A. S. Khwaja, A. Anpalagan, M. Naeem, B. Venkatesh, Joint bagged-boosted artificial
neural networks: Using ensemble machine learning to improve short-term electricity
load forecasting, Electric Power Systems Research, 179, Vol. 106080
N. Fraccanabbia, R. G. da Silva, M. H. D. M. Ribeiro, S. R. Moreno, L. dos Santos
Coelho, V. C Mariani, July 2020, Solar Power Forecasting Based on Ensemble Learning
Methods, In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1-7
S. R. Moreno, R. G. da Silva, M. H. D. M. Ribeiro, N. Fraccanabbia, V. C. Mariani,
L. D. S. Coelho, Belem Brazil, November 2019, Very short-term wind energy forecasting
based on stacking ensemble, In 14th Brazilian Computational Intelligence Meeting (CBIC),
pp. 1-7
S. Choi, J. Hur, 2020, An ensemble learner-based bagging model using past output data
for photovoltaic forecasting, Energies, Vol. 13, No. 6, pp. 1438
X. Luo, J. Sun, L. Wang, W. Wang, W. Zhao, J. Wu, Z. Zhang, 2018, Short-term wind
speed forecasting via stacked extreme learning machine with generalized correntropy,
IEEE Transactions on Industrial Informatics, Vol. 14, No. 11, pp. 4963-4971
L. Breiman, 1996, Stacked regressions, Machine learning, Vol. 24, No. 1, pp. 49-64
저자소개
제라르도 온도 미하(Gerardo Ondo Micha)
He received a B.S degree in Electrical and Electronics Engineering from University
Teknologi Petronas, Tronoh, Malaysia in 2017.
At present, he is enrolled in master's degree program at Sungkyunkwan University.
His research interests include Intermittency of Renewable Energies, power system
protection, islanding detection, hosting capacity, auto-reclosing schemes in AC, DC,
and Hybrid transmission lines, and artificial intelligence applications for the power
system.
He received the B.S., M.S., and Ph.D. degrees in electrical engineering from Sungkyunkwan
University, Suwon, Korea, in 1982, 1984, and 1990, respectively.
In 1990, he joined Jeju National University, Jeju, Korea, as a Full- Time Lecturer.
He was a Visiting Academic with the University of Bath, Bath, U.K., in 1996, 1998,
and 1999.
He has been a Professor with the College of Information and Communication Engineering,
Sungkyunkwan University, since 1992, where he is currently the Director of the Center
for Power Information Technology.
His current research interests include power system protection, artificial intelligence
applications for protection and control, modeling and protection of microgrid and
DC system.