This work is based on my summer internship at IIT Delhi in 2015. The project derives its base from the machine learning and data mining framework called windML. The windML framework provides various tools for machine learning based analysis of real-time wind energy time series data. This project further develops this framework to implement a Bootstrap Aggregation (Bagging) based system to improve upon the already available spatio-temporal regression models for accurate wind power forecasting. The project is implemented in python using windML, and Sklearn python libraries.

**Why Wind Power Forecasting?**

With the ever-increasing energy demand and increasing focus towards reducing global warming, the renewable energy sources like wind and solar energy are going to play a crucial role in future energy infrastructure. But this increased penetration of wind energy (and solar for that matter) is not without its problems. The major issue with the wind energy is its high level of uncertainty which can lead to wide-scale instability in the connected power grids. To tackle this problem we need to generate high accuracy prediction model for these energy sources. Further with the availability of a large amount of time series electricity generation data the prediction task is much more solvable than it ever was.

## Wind Power Forecasting as a Regression Problem

Wind power forecasting problem for a single turbine can be formulated as a time series regression problem. Depending on the time of forecast it can be classified into short-term (30, 60, 90 minutes ahead), medium-term (Day Ahead, Week Ahead) and long-term forecasting problem. In this case, we would tackle a short-term forecasting problem by using the power data at , ….., and time instants to forecast power at $latex t+\lambda$ time instant. So, at any time instant the power can be seen as a function of power , ….. and . Further, the future power can also be correlated with the difference of power at the previous time instances (, …..and ). Here is called time horizon and is called feature window. So, in this case we have () features to predict power .

__Spatio-Temporal Regression model for Wind Power Forecasting__

To forecast the power generated by a turbine, the regression model discussed so far uses of its own time series. To further improve the forecast we can use time series data of its neighbor turbines. As these neighbor turbines have to face the same wind (delayed or advanced depending upon the wind direction), their power time series data is easy to correlate with the target turbine. If in a given area, there are neighbor turbines, the total number features are now. So, the power produced by target turbine at time () is a function of features denoted by , where denotes the turbine (varying between to ) and denotes previous time instant (varying between and ).

__Bootstrap Aggregation of Regression Models__

So far the discussed regression problem can be formulated as:

=

Here is function that maps the forecasted power to the spatio-temporal features. This function is approximated by a machine-learning model as a function . But being approximate this function is not unique. We can develop different approximate models using different machine-learning techniques (like Support Vector Machines, Random Forest, Neural Network etc.) and by varying hyper-parameters of a single technique. All of these functions show some overfitting resulting in error during the forecast. One way of reducing this overfitting is to apply an ensemble technique called weighted bootstrap aggregation (or weighted bagging) to combine all machine-learning approximated functions. If we train different machine-learning methods to find approximations of the mapping function , the overall regression problem can be formulated as:

=

Here $latex \alpha_i$ is the weight for approximation .

__Bootstrap Aggregation: Python Implementation__

class BA_ensemble(object): ''' Variable 'estimators' can be seen as a list of models. Each element of the list is defined as: ['Model_Name', [Model Paramters]] ML Models with parameters: 1. 'SVR': kernal = 'rbf', degree = 3, gamma = 1e-4, C = 100, epsilon = 0.1 2. 'RF': max_depth = 30, n_estimators = 10, n_jobs = -1, random_state = 0 3. 'KNN': n_neighbors = 5, weights = 'uniform', leaf_size = 30 4. 'Ridge': alpha = 0.5 5. 'Bayesian': alpha_1 = 1e-6, alpha_2 = 1e-6 6. 'LR': fit_intercept =True , n_jobs = -1, normalize =False 7. 'GBR': learning_rate = 0.1, n_estimators = 100, max_depth = 3, loss='ls' ''' def __init__(self, train_x, train_y, estimators): ''' Creates and Trains all the models ''' self.reg = [] for d in range(len(train_x)): self.X_train = train_x[d] self.Y_train = train_y[d] for i in range(len(estimators)): print i self.reg.append(Estimator(info=estimators[i])) if estimators[i][0] == 'SVR': self.reg[-1].train(SVR(kernel=estimators[i][1][0], degree=estimators[i][1][1], gamma = estimators[i][1][2] , C = estimators[i][1][3], epsilon=estimators[i][1][4]).fit(self.X_train,self.Y_train)) elif estimators[i][0] == 'RF': self.reg[-1].train(RandomForestRegressor(max_depth=estimators[i][1][0], n_estimators=estimators[i][1][1], n_jobs=estimators[i][1][2], random_state=estimators[i][1][3]).fit(self.X_train,self.Y_train)) elif estimators[i][0] == 'KNN': self.reg[-1].train(KNeighborsRegressor(n_neighbors=estimators[i][1][0], weights=estimators[i][1][1], leaf_size=estimators[i][1][2]).fit(self.X_train,self.Y_train)) elif estimators[i][0] == 'Ridge': self.reg[-1].train(linear_model.Ridge(alpha = estimators[i][1][0]).fit(self.X_train,self.Y_train)) elif estimators[i][0] == 'Bayesian': self.reg[-1].train(linear_model.BayesianRidge(alpha_1 = estimators[i][1][0], alpha_2 = estimators[i][1][1]).fit(self.X_train,self.Y_train)) elif estimators[i][0] == 'LR': self.reg[-1].train(linear_model.LinearRegression(fit_intercept = estimators[i][1][0], n_jobs = estimators[i][1][1], normalize = estimators[i][1][2]).fit(self.X_train,self.Y_train)) elif estimators[i][0] == 'GBR': self.reg[-1].train(GradientBoostingRegressor(learning_rate = estimators[i][1][0], n_estimators = estimators[i][1][1], max_depth = estimators[i][1][2], loss=estimators[i][1][3]).fit(self.X_train,self.Y_train)) def predict(self, test_x, test_y): ''' Predicts/Forecasts using all the models ''' self.X_test = test_x self.Y_test = test_y self.y_hats = [] for i in range(len(self.reg)): print i self.reg[i].predictions(self.reg[i].regressor.predict(self.X_test)) # Mean Square Error self.reg[i].PI_update('mse', mean_squared_error(self.Y_test, self.reg[i].y_hat)) def aggregate(self, alpha = []): ''' Ensembles (Weighted Bagging/Bootstrap-Aggregation) all the predictions ''' if len(alpha) <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>< len(self.reg): for i in range(len(self.reg)-len(alpha)): alpha.append(1) self.alpha = alpha self.y_hat = [] for i in range(len(self.X_test)): self.y_hat.append(0) for j in range(len(self.reg)): self.y_hat[-1] = self.y_hat[-1] + self.alpha[j] * self.reg[j].y_hat[i] self.y_hat[-1] = self.y_hat[-1]/(float(sum(self.alpha))) # Mean Square Error self.mse = mean_squared_error(self.Y_test, self.y_hat)

__Results__

The above slide shows the prediction made by the weighted bootstrap aggregation strategy in 10 different turbine locations. The red line shows the predicted power while the blue line shows the actual measurements. Clearly, the proposed strategy is able to make predictions close to the actual power value. The above bootstrap aggregation is only using 5 regression models: Random Forest, Ridge Regression, Bayesian Learning, Gradient Boosting Regression and K-Nearest-Neighbors.

The bar graph clearly shows that compared to single regression model the bootstrap aggregation is giving much lower mean square error in all cases.

Turbine | RF | Ridge | Bayesian | GBR | KNN | BA |
---|---|---|---|---|---|---|

tehachapi | 6.64 | 6.12 | 6.12 | 6.05 | 6.89 | 5.80 |

cheyenne | 8.79 | 8.14 | 8.14 | 8.23 | 9.10 | 8.00 |

palmsprings | 6.22 | 5.48 | 5.48 | 5.38 | 6.17 | 5.27 |

reno | 13.31 | 12.30 | 12.31 | 12.15 | 14.99 | 11.91 |

lasvegas | 10.10 | 9.20 | 9.20 | 8.99 | 10.11 | 8.83 |

hesperia | 7.30 | 6.75 | 6.74 | 6.87 | 7.60 | 6.49 |

lancaster | 9.10 | 8.57 | 8.57 | 8.43 | 9.30 | 8.20 |

yuccavalley | 10.30 | 9.33 | 9.33 | 9.32 | 10.69 | 9.19 |

vantage | 6.17 | 5.61 | 5.60 | 5.73 | 6.43 | 5.49 |

casper | 10.22 | 10.00 | 9.99 | 9.71 | 10.42 | 9.39 |

The above results were taken by using just 5 regression model with equal weights. The prediction accuracy can be further improved by taking more models and optimizing the model weights.