# Wind Power Forecasting by Bootstrap Aggregation of Spatio-Temporal Regression Models

This work is based on my summer internship at IIT Delhi in 2015. The project derives its base from the machine learning and data mining framework called windML. The windML framework provides various tools for machine learning based analysis of real-time wind energy time series data. This project further develops this framework to implement a Bootstrap Aggregation (Bagging) based system to improve upon the already available spatio-temporal regression models for accurate wind power forecasting. The project is implemented in python using windML, and Sklearn python libraries.

## Why Wind Power Forecasting?

With the ever-increasing energy demand and increasing focus towards reducing global warming, the renewable energy sources like wind and solar energy are going to play a crucial role in future energy infrastructure. But this increased penetration of wind energy (and solar for that matter) is not without its problems. The major issue with the wind energy is its high level of uncertainty which can lead to wide-scale instability in the connected power grids. To tackle this problem we need to generate high accuracy prediction model for these energy sources. Further with the availability of a large amount of time series electricity generation data the prediction task is much more solvable than it ever was.

## Wind Power Forecasting as a Regression Problem

Wind power forecasting problem for a single turbine can be formulated as a time series regression problem. Depending on the time of forecast it can be classified into short-term (30, 60, 90 minutes ahead), medium-term (Day Ahead, Week Ahead) and long-term forecasting problem. In this case, we would tackle a short-term forecasting problem by using the power data at $t-\mu+1$$t-\mu+2$….., $t-1$ and $t$ time instants to forecast power at $latex t+\lambda$ time instant. So, at any time instant $t$ the power $P_{t+\lambda}$ can be seen as a function of power $P_{t-\mu+1}$$P_{t-\mu+2}$….. $P_{t-1}$ and $P_{t}$. Further, the future power can also be correlated with the difference of power at the previous time instances ($P_{t-\mu+2} - P_{t-\mu+1}$$P_{t-\mu+3} - P_{t-\mu+2}$…..and $P_{t} - P_{t-1}$).  Here $\lambda$ is called time horizon and $\mu$ is called feature window. So, in this case we have ($2\mu +1$) features to predict power $P_{t+\lambda}$.

## Spatio-Temporal Regression model for Wind Power Forecasting

To forecast the power generated by a turbine, the regression model discussed so far uses of its own time series. To further improve the forecast we can use time series data of its neighbor turbines. As these neighbor turbines have to face the same wind (delayed or advanced depending upon the wind direction), their power time series data is easy to correlate with the target turbine. If in a given area, there are $N-1$ neighbor turbines, the total number features are $N(2\mu +1)$ now. So, the power produced by target turbine $1$ at time $t+\lambda$ ($P^{1}_{t+\lambda}$) is a function of $N(2\mu +1)$ features denoted by $P^{k}_{t-j}$, where $k$ denotes the $k^{th}$ turbine (varying between $1$ to $N$) and $t-j$ denotes previous time instant (varying between $t-\mu+1$ and $t$).

## Bootstrap Aggregation of Regression Models

So far the discussed regression problem can be formulated as:
$P^{1}_{t+\lambda}$ = $f(P^{k}_{t-j})$
Here $f$ is function that maps the forecasted power to the $N(2\mu +1)$ spatio-temporal features. This function is approximated by a machine-learning model as a function $f_i$. But being approximate this function is not unique. We can develop different approximate models using different machine-learning techniques (like Support Vector Machines, Random Forest, Neural Network etc.) and by varying hyper-parameters of a single technique. All of these functions show some overfitting resulting in error during the forecast. One way of reducing this overfitting is to apply an ensemble technique called weighted bootstrap aggregation (or weighted bagging) to combine all machine-learning approximated functions. If we train $M$ different machine-learning methods to find $M$ approximations of the mapping function $f$, the overall regression problem can be formulated as:
$P^{1}_{t+\lambda}$ = $\frac{1}{\sum \alpha_i}\sum \alpha_i f_i(P^{k}_{t-j})$
Here $latex \alpha_i$ is the weight for approximation $f_i$.

## Bootstrap Aggregation: Python Implementation

class BA_ensemble(object):
'''
Variable 'estimators' can be seen as a list of models. Each element of the list
is defined as: ['Model_Name', [Model Paramters]]

ML Models with parameters:
1. 'SVR': kernal = 'rbf', degree = 3, gamma = 1e-4, C = 100, epsilon = 0.1
2. 'RF': max_depth = 30, n_estimators = 10, n_jobs = -1, random_state = 0
3. 'KNN': n_neighbors = 5, weights = 'uniform', leaf_size = 30
4. 'Ridge': alpha = 0.5
5. 'Bayesian': alpha_1 = 1e-6, alpha_2 = 1e-6
6. 'LR': fit_intercept =True , n_jobs = -1, normalize =False
7. 'GBR': learning_rate = 0.1, n_estimators = 100, max_depth = 3, loss='ls'
'''
def __init__(self, train_x, train_y, estimators):
'''
Creates and Trains all the models
'''
self.reg = []
for d in range(len(train_x)):
self.X_train = train_x[d]
self.Y_train = train_y[d]
for i in range(len(estimators)):
print i
self.reg.append(Estimator(info=estimators[i]))
if estimators[i][0] == 'SVR':
self.reg[-1].train(SVR(kernel=estimators[i][1][0], degree=estimators[i][1][1], gamma = estimators[i][1][2] , C = estimators[i][1][3], epsilon=estimators[i][1][4]).fit(self.X_train,self.Y_train))
elif estimators[i][0] == 'RF':
self.reg[-1].train(RandomForestRegressor(max_depth=estimators[i][1][0], n_estimators=estimators[i][1][1], n_jobs=estimators[i][1][2], random_state=estimators[i][1][3]).fit(self.X_train,self.Y_train))
elif estimators[i][0] == 'KNN':
self.reg[-1].train(KNeighborsRegressor(n_neighbors=estimators[i][1][0], weights=estimators[i][1][1], leaf_size=estimators[i][1][2]).fit(self.X_train,self.Y_train))
elif estimators[i][0] == 'Ridge':
self.reg[-1].train(linear_model.Ridge(alpha = estimators[i][1][0]).fit(self.X_train,self.Y_train))
elif estimators[i][0] == 'Bayesian':
self.reg[-1].train(linear_model.BayesianRidge(alpha_1 = estimators[i][1][0], alpha_2 = estimators[i][1][1]).fit(self.X_train,self.Y_train))
elif estimators[i][0] == 'LR':
self.reg[-1].train(linear_model.LinearRegression(fit_intercept = estimators[i][1][0], n_jobs = estimators[i][1][1], normalize = estimators[i][1][2]).fit(self.X_train,self.Y_train))
elif estimators[i][0] == 'GBR':
self.reg[-1].train(GradientBoostingRegressor(learning_rate = estimators[i][1][0], n_estimators = estimators[i][1][1], max_depth = estimators[i][1][2],  loss=estimators[i][1][3]).fit(self.X_train,self.Y_train))
def predict(self, test_x, test_y):
'''
Predicts/Forecasts using all the models
'''
self.X_test = test_x
self.Y_test = test_y
self.y_hats = []
for i in range(len(self.reg)):
print i
self.reg[i].predictions(self.reg[i].regressor.predict(self.X_test))
# Mean Square Error
self.reg[i].PI_update('mse', mean_squared_error(self.Y_test, self.reg[i].y_hat))
def aggregate(self, alpha = []):
'''
Ensembles (Weighted Bagging/Bootstrap-Aggregation) all the predictions
'''
if len(alpha) <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>&lt; len(self.reg):
for i in range(len(self.reg)-len(alpha)):
alpha.append(1)
self.alpha = alpha
self.y_hat = []
for i in range(len(self.X_test)):
self.y_hat.append(0)
for j in range(len(self.reg)):
self.y_hat[-1] = self.y_hat[-1] + self.alpha[j] * self.reg[j].y_hat[i]
self.y_hat[-1] = self.y_hat[-1]/(float(sum(self.alpha)))
# Mean Square Error
self.mse = mean_squared_error(self.Y_test, self.y_hat)


## Results

This slideshow requires JavaScript.

The above slide shows the prediction made by the weighted bootstrap aggregation strategy in 10 different turbine locations. The red line shows the predicted power while the blue line shows the actual measurements. Clearly, the proposed strategy is able to make predictions close to the actual power value. The above bootstrap aggregation is only using 5 regression models: Random Forest, Ridge Regression,  Bayesian Learning, Gradient Boosting Regression and K-Nearest-Neighbors.

This slideshow requires JavaScript.

The bar graph clearly shows that compared to single regression model the bootstrap aggregation is giving much lower mean square error in all cases.

Turbine RF Ridge Bayesian GBR KNN BA
tehachapi 6.64 6.12 6.12 6.05 6.89 5.80
cheyenne 8.79 8.14 8.14 8.23 9.10 8.00
palmsprings 6.22 5.48 5.48 5.38 6.17 5.27
reno 13.31 12.30 12.31 12.15 14.99 11.91
lasvegas 10.10 9.20 9.20 8.99 10.11 8.83
hesperia 7.30 6.75 6.74 6.87 7.60 6.49
lancaster 9.10 8.57 8.57 8.43 9.30 8.20
yuccavalley 10.30 9.33 9.33 9.32 10.69 9.19
vantage 6.17 5.61 5.60 5.73 6.43 5.49
casper 10.22 10.00 9.99 9.71 10.42 9.39

The above results were taken by using just 5 regression model with equal weights. The prediction accuracy can be further improved by taking more models and optimizing the model weights.

Check my Github page for complete implementation.