Title: | Ensemble Algorithms for Time Series Forecasting with Modeltime |
---|---|
Description: | A 'modeltime' extension that implements time series ensemble forecasting methods including model averaging, weighted averaging, and stacking. These techniques are popular methods to improve forecast accuracy and stability. |
Authors: | Matt Dancho [aut, cre], Business Science [cph] |
Maintainer: | Matt Dancho <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.4 |
Built: | 2024-11-17 05:28:00 UTC |
Source: | https://github.com/business-science/modeltime.ensemble |
Creates an Ensemble Model using Mean/Median Averaging
ensemble_average(object, type = c("mean", "median"))
ensemble_average(object, type = c("mean", "median"))
object |
A Modeltime Table |
type |
Specify the type of average ("mean" or "median") |
The input to an ensemble_average()
model is always a Modeltime Table,
which contains the models that you will ensemble.
Averaging Methods
The average method uses an un-weighted average using type
of either:
"mean"
: Performs averaging using mean(x, na.rm = TRUE)
to aggregate each
underlying models forecast at each timestamp
"median"
: Performs averaging using stats::median(x, na.rm = TRUE)
to aggregate each
underlying models forecast at each timestamp
A mdl_time_ensemble
object.
library(tidymodels) library(modeltime) library(modeltime.ensemble) library(dplyr) library(timetk) # Make an ensemble from a Modeltime Table ensemble_fit <- m750_models %>% ensemble_average(type = "mean") ensemble_fit # Forecast with the Ensemble modeltime_table( ensemble_fit ) %>% modeltime_forecast( new_data = testing(m750_splits), actual_data = m750 ) %>% plot_modeltime_forecast( .interactive = FALSE, .conf_interval_show = FALSE )
library(tidymodels) library(modeltime) library(modeltime.ensemble) library(dplyr) library(timetk) # Make an ensemble from a Modeltime Table ensemble_fit <- m750_models %>% ensemble_average(type = "mean") ensemble_fit # Forecast with the Ensemble modeltime_table( ensemble_fit ) %>% modeltime_forecast( new_data = testing(m750_splits), actual_data = m750 ) %>% plot_modeltime_forecast( .interactive = FALSE, .conf_interval_show = FALSE )
A 2-stage stacking regressor that follows:
Stage 1: Sub-Model's are Trained & Predicted using modeltime.resample::modeltime_fit_resamples()
.
Stage 2: A Meta-learner (model_spec
) is trained on Out-of-Sample Sub-Model
Predictions using ensemble_model_spec()
.
ensemble_model_spec( object, model_spec, kfolds = 5, param_info = NULL, grid = 6, control = control_grid() )
ensemble_model_spec( object, model_spec, kfolds = 5, param_info = NULL, grid = 6, control = control_grid() )
object |
A Modeltime Table. Used for ensemble sub-models. |
model_spec |
A Can be either:
|
kfolds |
K-Fold Cross Validation for tuning the Meta-Learner.
Controls the number of folds used in the meta-learner's cross-validation.
Gets passed to |
param_info |
A |
grid |
Grid specification or grid size for tuning the Meta Learner.
Gets passed to |
control |
An object used to modify the tuning process.
Uses |
Stacked Ensemble Process
Start with a Modeltime Table to define your sub-models.
Step 1: Use modeltime.resample::modeltime_fit_resamples()
to perform the submodel resampling procedure.
Step 2: Use ensemble_model_spec()
to define and train the meta-learner.
What goes on inside the Meta Learner?
The Meta-Learner Ensembling Process uses the following basic steps:
Make Cross-Validation Predictions.
Cross validation predictions are made for each sub-model with modeltime.resample::modeltime_fit_resamples()
.
The out-of-sample sub-model predictions contained in .resample_results
are used as the input to the meta-learner.
Train a Stacked Regressor (Meta-Learner).
The sub-model out-of-sample cross validation predictions are then
modeled using a model_spec
with options:
Tuning: If the model_spec
does include tuning parameters via tune::tune()
then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The
parameters and grid can adjusted using kfolds
, grid
, and param_info
.
No-Tuning: If the model_spec
does not include tuning parameters via tune::tune()
then the meta-learner will not be hypeparameter tuned and will have the model
fitted to the sub-model predictions.
Final Model Selection.
If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
If not-tuned, the fitted model from Stage 2 is used.
Progress
The best way to follow the training process and watch progress is to use
control = control_grid(verbose = TRUE)
to see progress.
Parallelize
Portions of the process can be parallelized. To parallelize, set
up parallelization using tune
via one of the backends such as
doFuture
. Then set control = control_grid(allow_par = TRUE)
A mdl_time_ensemble
object.
library(tidymodels) library(modeltime) library(modeltime.ensemble) library(dplyr) library(timetk) library(glmnet) # Step 1: Make resample predictions for submodels resamples_tscv <- training(m750_splits) %>% time_series_cv( assess = "2 years", initial = "5 years", skip = "2 years", slice_limit = 1 ) submodel_predictions <- m750_models %>% modeltime_fit_resamples( resamples = resamples_tscv, control = control_resamples(verbose = TRUE) ) # Step 2: Metalearner ---- # * No Metalearner Tuning ensemble_fit_lm <- submodel_predictions %>% ensemble_model_spec( model_spec = linear_reg() %>% set_engine("lm"), control = control_grid(verbose = TRUE) ) ensemble_fit_lm # * With Metalearner Tuning ---- ensemble_fit_glmnet <- submodel_predictions %>% ensemble_model_spec( model_spec = linear_reg( penalty = tune(), mixture = tune() ) %>% set_engine("glmnet"), grid = 2, control = control_grid(verbose = TRUE) ) ensemble_fit_glmnet
library(tidymodels) library(modeltime) library(modeltime.ensemble) library(dplyr) library(timetk) library(glmnet) # Step 1: Make resample predictions for submodels resamples_tscv <- training(m750_splits) %>% time_series_cv( assess = "2 years", initial = "5 years", skip = "2 years", slice_limit = 1 ) submodel_predictions <- m750_models %>% modeltime_fit_resamples( resamples = resamples_tscv, control = control_resamples(verbose = TRUE) ) # Step 2: Metalearner ---- # * No Metalearner Tuning ensemble_fit_lm <- submodel_predictions %>% ensemble_model_spec( model_spec = linear_reg() %>% set_engine("lm"), control = control_grid(verbose = TRUE) ) ensemble_fit_lm # * With Metalearner Tuning ---- ensemble_fit_glmnet <- submodel_predictions %>% ensemble_model_spec( model_spec = linear_reg( penalty = tune(), mixture = tune() ) %>% set_engine("glmnet"), grid = 2, control = control_grid(verbose = TRUE) ) ensemble_fit_glmnet
Creates an Ensemble Model using Mean/Median Averaging in the Modeltime Nested Forecasting Workflow.
ensemble_nested_average( object, type = c("mean", "median"), keep_submodels = TRUE, model_ids = NULL, control = control_nested_fit() )
ensemble_nested_average( object, type = c("mean", "median"), keep_submodels = TRUE, model_ids = NULL, control = control_nested_fit() )
object |
A nested modeltime object (inherits class |
type |
One of "mean" for mean averaging or "median" for median averaging |
keep_submodels |
Whether or not to keep the submodels in the nested modeltime table results |
model_ids |
A vector of id's ( |
control |
Controls various aspects of the ensembling process. See |
If we start with a nested modeltime table, we can add ensembles.
nested_modeltime_tbl # Nested Modeltime Table Trained on: .splits | Model Errors: [0] # A tibble: 2 x 5 id .actual_data .future_data .splits .modeltime_tables <fct> <list> <list> <list> <list> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]>
An ensemble can be added to a Nested modeltime table.
ensem <- nested_modeltime_tbl %>% ensemble_nested_average( type = "mean", keep_submodels = TRUE, control = control_nested_fit(allow_par = FALSE, verbose = TRUE) )
We can then verify the model has been added.
ensem %>% extract_nested_modeltime_table()
This produces an ensemble .model_id 3, which is an ensemble of the first two models.
# A tibble: 4 x 6 id .model_id .model .model_desc .type .calibration_data <fct> <dbl> <list> <chr> <chr> <list> 1 1_1 1 <workflow> PROPHET Test <tibble [52 x 4]> 2 1_1 2 <workflow> XGBOOST Test <tibble [52 x 4]> 3 1_1 3 <ensemble [2]> ENSEMBLE (MEAN): 2 MODELS Test <tibble [52 x 4]>
Additional ensembles can be added by simply adding onto the nested modeltime table.
Notice that we make use of model_ids
to make sure it only uses model id's 1 and 2.
ensem_2 <- ensem %>% ensemble_nested_average( type = "median", keep_submodels = TRUE, model_ids = c(1,2), control = control_nested_fit(allow_par = FALSE, verbose = TRUE) )
This returns a 4th model that is a median ensemble of the first two models.
ensem_2 %>% extract_nested_modeltime_table() # A tibble: 4 x 6 id .model_id .model .model_desc .type .calibration_data <fct> <dbl> <list> <chr> <chr> <list> 1 1_1 1 <workflow> PROPHET Test <tibble [52 x 4]> 2 1_1 2 <workflow> XGBOOST Test <tibble [52 x 4]> 3 1_1 3 <ensemble [2]> ENSEMBLE (MEAN): 2 MODELS Test <tibble [52 x 4]> 4 1_1 4 <ensemble [2]> ENSEMBLE (MEDIAN): 2 MODELS Test <tibble [52 x 4]>
The nested modeltime table with an ensemble model added.
Creates an Ensemble Model using Weighted Averaging in the Modeltime Nested Forecasting Workflow.
ensemble_nested_weighted( object, loadings, scale_loadings = TRUE, metric = "rmse", keep_submodels = TRUE, model_ids = NULL, control = control_nested_fit() )
ensemble_nested_weighted( object, loadings, scale_loadings = TRUE, metric = "rmse", keep_submodels = TRUE, model_ids = NULL, control = control_nested_fit() )
object |
A nested modeltime object (inherits class |
loadings |
A vector of weights corresponding to the loadings |
scale_loadings |
If TRUE, divides by the sum of the loadings to proportionally weight the submodels. |
metric |
The accuracy metric to rank models by the test accuracy table.
Loadings are then applied in the order from best to worst models.
Default: |
keep_submodels |
Whether or not to keep the submodels in the nested modeltime table results |
model_ids |
A vector of id's ( |
control |
Controls various aspects of the ensembling process. See |
If we start with a nested modeltime table, we can add ensembles.
nested_modeltime_tbl # Nested Modeltime Table Trained on: .splits | Model Errors: [0] # A tibble: 2 x 5 id .actual_data .future_data .splits .modeltime_tables <fct> <list> <list> <list> <list> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]>
An ensemble can be added to a Nested modeltime table.
ensem <- nested_modeltime_tbl %>% ensemble_nested_weighted( loadings = c(2,1), control = control_nested_fit(allow_par = FALSE, verbose = TRUE) )
We can then verify the model has been added.
ensem %>% extract_nested_modeltime_table()
This produces an ensemble .model_id 3, which is an ensemble of the first two models.
# A tibble: 4 x 6 id .model_id .model .model_desc .type .calibration_data <fct> <dbl> <list> <chr> <chr> <list> 1 1_3 1 <workflow> PROPHET Test <tibble [52 x 4]> 2 1_3 2 <workflow> XGBOOST Test <tibble [52 x 4]> 3 1_3 3 <ensemble [2]> ENSEMBLE (WEIGHTED): 2 MODELS Test <tibble [52 x 4]>
We can verify the loadings have been applied correctly. Note that the loadings will be applied based on the model with the lowest RMSE.
ensem %>% extract_nested_modeltime_table(1) %>% slice(3) %>% pluck(".model", 1)
Note that the xgboost model gets the 66% loading and prophet gets 33% loading. This is because xgboost has the lower RMSE in this case.
-- Modeltime Ensemble ------------------------------------------- Ensemble of 2 Models (WEIGHTED) # Modeltime Table # A tibble: 2 x 6 .model_id .model .model_desc .type .calibration_data .loadings <int> <list> <chr> <chr> <list> <dbl> 1 1 <workflow> PROPHET Test <tibble [52 x 4]> 0.333 2 2 <workflow> XGBOOST Test <tibble [52 x 4]> 0.667
The nested modeltime table with an ensemble model added.
Makes an ensemble by applying loadings
to weight sub-model predictions
ensemble_weighted(object, loadings, scale_loadings = TRUE)
ensemble_weighted(object, loadings, scale_loadings = TRUE)
object |
A Modeltime Table |
loadings |
A vector of weights corresponding to the loadings |
scale_loadings |
If TRUE, divides by the sum of the loadings to proportionally weight the submodels. |
The input to an ensemble_weighted()
model is always a Modeltime Table,
which contains the models that you will ensemble.
Weighting Method
The weighted method uses uses loadings
by applying a
loading x model prediction for each submodel.
A mdl_time_ensemble
object.
library(tidymodels) library(modeltime) library(modeltime.ensemble) library(dplyr) library(timetk) # Make an ensemble from a Modeltime Table ensemble_fit <- m750_models %>% ensemble_weighted( loadings = c(3, 3, 1), scale_loadings = TRUE ) ensemble_fit # Forecast with the Ensemble modeltime_table( ensemble_fit ) %>% modeltime_forecast( new_data = testing(m750_splits), actual_data = m750 ) %>% plot_modeltime_forecast( .interactive = FALSE, .conf_interval_show = FALSE )
library(tidymodels) library(modeltime) library(modeltime.ensemble) library(dplyr) library(timetk) # Make an ensemble from a Modeltime Table ensemble_fit <- m750_models %>% ensemble_weighted( loadings = c(3, 3, 1), scale_loadings = TRUE ) ensemble_fit # Forecast with the Ensemble modeltime_table( ensemble_fit ) %>% modeltime_forecast( new_data = testing(m750_splits), actual_data = m750 ) %>% plot_modeltime_forecast( .interactive = FALSE, .conf_interval_show = FALSE )