--- title: Experiment Utils keywords: fastai sidebar: home_sidebar summary: "This notebook contains a set of functions to easily perform experiments on time series datasets. In this notebook you can see functions for:" description: "This notebook contains a set of functions to easily perform experiments on time series datasets. In this notebook you can see functions for:" nb_path: "nbs/experiments__utils.ipynb" ---
{% raw %}
{% endraw %}

In the next two scetion you can see enviroment variables and imports that are used for set of functions shown in this notebook.

{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Helper functions that represent basic use of neuralforecast library:

{% raw %}

get_mask_dfs[source]

get_mask_dfs(Y_df:DataFrame, ds_in_val:int, ds_in_test:int)

Generates train, test and validation mask. Train mask begins by avoiding ds_in_test.

Parameters

Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. ds_in_val: int Number of ds in validation. ds_in_test: int Number of ds in test.

Returns

train_mask_df: pd.DataFrame Train mask dataframe. val_mask_df: pd.DataFrame Validation mask dataframe. test_mask_df: pd.DataFrame Test mask dataframe.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

get_random_mask_dfs[source]

get_random_mask_dfs(Y_df:DataFrame, ds_in_test:int, n_val_windows:int, n_ds_val_window:int, n_uids:int, freq:str)

Generates train, test and random validation mask. Train mask begins by avoiding ds_in_test

Validation mask: 1) samples n_uids unique ids 2) creates windows of size n_ds_val_window

Parameters

Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. ds_in_test: int Number of ds in test. n_val_windows: int Number of windows for validation. n_ds_val_window: int Number of ds in each validation window. n_uids: int Number of unique ids in validation. freq: str string that determines datestamp frequency, used in random windows creation.

Returns

train_mask_df: pd.DataFrame Train mask dataframe. val_mask_df: pd.DataFrame Validation mask dataframe. test_mask_df: pd.DataFrame Test mask dataframe.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

scale_data[source]

scale_data(Y_df:DataFrame, X_df:DataFrame, mask_df:DataFrame, normalizer_y:str, normalizer_x:str)

Scales input data accordingly to given normalizer parameters.

Parameters

Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. X_df: pd.DataFrame Exogenous time series with columns ['unique_id', 'ds', 'y'] mask_df: pd.DataFrame Mask dataframe. normalizer_y: str Normalizer for scaling Y_df. normalizer_x: str Normalizer for scaling X_df.

Returns

Y_df: pd.DataFrame Scaled target time series. X_df: pd.DataFrame Scaled exogenous time series with columns. scaler_y: Scaler Scaler object for Y_df.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

create_datasets[source]

create_datasets(mc:dict, S_df:DataFrame, Y_df:DataFrame, X_df:DataFrame, f_cols:list, ds_in_test:int, ds_in_val:int, verbose:bool=False)

Creates train, validation and test datasets.

Parameters

mc: dict Model configuration. S_df: pd.DataFrame Static exogenous variables with columns ['unique_id', 'ds'] and static variables. Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. X_df: pd.DataFrame Exogenous time series with columns ['unique_id', 'ds', 'y'] f_cols: list List of exogenous variables of the future. ds_in_test: int Number of ds in test. ds_in_val: int Number of ds in validation.

Returns

train_dataset: BaseDataset Train dataset. valid_dataset: BaseDataset Validation dataset. test_dataset: BaseDataset Test dataset. scaler_y: Scaler Scaler object for Y_df.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_loaders[source]

instantiate_loaders(mc:dict, train_dataset:BaseDataset, val_dataset:BaseDataset, test_dataset:BaseDataset)

Creates train, validation and test loader classes.

Parameters

mc: dict Model configuration. train_dataset: BaseDataset Train dataset. val_dataset: BaseDataset Validation dataset. test_dataset: BaseDataset Test dataset.

Returns

train_loader: DataLoader Train loader. val_loader: DataLoader Validation loader. test_loader: DataLoader Test loader.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_nbeats[source]

instantiate_nbeats(mc:dict)

Creates nbeats model.

Parameters

mc: dict Model configuration.

Returns

model: NBEATS Nbeats model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_esrnn[source]

instantiate_esrnn(mc:dict)

Creates esrnn model.

Parameters

mc: dict Model configuration.

Returns

model: ESRNN Esrnn model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_rnn[source]

instantiate_rnn(mc:dict)

Creates esrnn model.

Parameters

mc: dict Model configuration.

Returns

model: RNN RNN model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_mqesrnn[source]

instantiate_mqesrnn(mc:dict)

Creates mqesrnn model.

Parameters

mc: dict Model configuration.

Returns

model: MQESRNN Mqesrnn model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_nhits[source]

instantiate_nhits(mc:dict)

Creates nhits model.

Parameters

mc: dict Model configuration.

Returns

model: NHITS Nhits model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_autoformer[source]

instantiate_autoformer(mc:dict)

Creates autoformer model.

Parameters

mc: dict Model configuration.

Returns

model: Autoformer Autoformer model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

instantiate_model[source]

instantiate_model(mc:dict)

Creates one of the models. (nbeats, esrnn, mqesrnn, nhits, autoformer)

Parameters

mc: dict Model configuration.

Returns

model: pl.LightningModule Forecast model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

predict[source]

predict(mc:dict, model:LightningModule, trainer:Trainer, loader:DataLoader, scaler_y:Scaler)

Predicts results on dataset using trained model.

Parameters

mc: dict Model configuration. model: pl.LightningModule Forecast model. trainer: pl.Trainer Trainer object. loader: DataLoader Data loader. scaler_y: Scaler Scaler object for target time series.

Returns

y_true: np.array True values from dataset. y_hat: np.array Predicted values from dataset. mask: np.array Masks for values. meta_data: np.array Metada from dataset.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

fit[source]

fit(mc:dict, Y_df:DataFrame, X_df:DataFrame=None, S_df:DataFrame=None, ds_in_val:int=0, ds_in_test:int=0, f_cols:list=[], verbose:bool=False)

Traines model on given dataset.

Parameters

mc: dict Model configuration. Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. X_df: pd.DataFrame Exogenous time series with columns ['unique_id', 'ds', 'y']. S_df: pd.DataFrame Static exogenous variables with columns ['unique_id', 'ds']. and static variables. ds_in_val: int Number of ds in validation. ds_in_test: int Number of ds in test. f_cols: list List of exogenous variables of the future.

Returns

model: pl.LightningModule Forecast model. trainer: pl.Trainer Trainer object. val_loader: DataLoader Validation loader. test_loader: DataLoader Test loader. scaler_y: Scaler Scaler object for target time series.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

model_fit_predict[source]

model_fit_predict(mc:dict, S_df:DataFrame, Y_df:DataFrame, X_df:DataFrame, f_cols:list, ds_in_val:int, ds_in_test:int, verbose:bool)

Traines model on train dataset, then calculates predictions on test dataset.

Parameters

mc: dict Model configuration. Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. X_df: pd.DataFrame Exogenous time series with columns ['unique_id', 'ds', 'y']. f_cols: list List of exogenous variables of the future. ds_in_val: int Number of ds in validation. ds_in_test: int Number of ds in test.

Returns

results: dict Dictionary with results of training and prediction on model.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

evaluate_model[source]

evaluate_model(mc:dict, loss_function_val:callable, loss_functions_test:dict, S_df:DataFrame, Y_df:DataFrame, X_df:DataFrame, f_cols:list, ds_in_val:int, ds_in_test:int, return_forecasts:bool, return_model:bool, save_trials:bool, trials:Trials, results_dir:str, step_save_progress:int=5, loss_kwargs:list=None, verbose:bool=False)

Evaluate model on given dataset.

Parameters

mc: dictionary Model configuration. loss_function_val: function Loss function used for validation. loss_functions_test: Dictionary Loss functions used for test. (function name: string, function: fun) S_df: pd.DataFrame Static exogenous variables with columns ['unique_id', 'ds']. and static variables. Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. X_df: pd.DataFrame Exogenous time series with columns ['unique_id', 'ds', 'y']. f_cols: list List of exogenous variables of the future. ds_in_val: int Number of ds in validation. ds_in_test: int Number of ds in test. return_forecasts: bool If true return forecast on test. save_trials: bool If true save progres in file. trials: hyperopt.Trials Results from model evaluation. results_dir: str File path to save results. step_save_progress: int Every n-th step is saved in file. loss_kwargs: List Loss function arguments.

Returns

results_output: dict Dictionary with results of model evaluation.

{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

hyperopt_tunning[source]

hyperopt_tunning(space:dict, hyperopt_max_evals:int, loss_function_val:callable, loss_functions_test:dict, S_df:DataFrame, Y_df:DataFrame, X_df:DataFrame, f_cols:list, ds_in_val:int, ds_in_test:int, return_forecasts:bool, return_model:bool, save_trials:bool, results_dir:str, step_save_progress:int=5, loss_kwargs:list=None, verbose:bool=False)

Evaluates multiple models trained on given dataset. Models are trained with different hyperparameters. Hyperparameters are changed until function is minimized in hyperparameter space. All models are trained and evaluated, until function is minimized.

Parameters

space: Dictionary Dictionary that contines hyperparameters that create space. hyperopt_max_evals: int Maximum number of evaluations. loss_function_val: function Loss function used for validation. loss_functions_test: Dictionary Loss functions used for test. (function name: string, function: fun) S_df: pd.DataFrame Static exogenous variables with columns ['unique_id', 'ds']. and static variables. Y_df: pd.DataFrame Target time series with columns ['unique_id', 'ds', 'y']. X_df: pd.DataFrame Exogenous time series with columns ['unique_id', 'ds', 'y']. f_cols: list List of exogenous variables of the future. ds_in_val: int Number of ds in validation. ds_in_test: int Number of ds in test. return_forecasts: bool If true return forecast on test. return_model: bool If true return models. save_trials: bool If true save progres in file. results_dir: str File path to save results. step_save_progress: int Every n-th step is saved in file. loss_kwargs: List Loss function arguments. verbose: If true, will print summary of dataset, model and training.

Returns

trials: Trials Results from model evaluation.

{% endraw %} {% raw %}
{% endraw %}

Experiment Utils Examples

This part of notebook shows simple use of functions given in this notebook. EPF dataset in used in this experiments. Two experiments are shown where hyperparameter tunning is runned for two models NHITS and NBEATS.

{% raw %}
import torch as t

from neuralforecast.losses.numpy import mae, rmse
from neuralforecast.auto import nhits_space, nbeats_space
from neuralforecast.data.datasets.epf import EPF
{% endraw %} {% raw %}
dataset = ['NP']
Y_df, X_df, S_df = EPF.load_groups(directory='data', groups=dataset)
X_df = X_df[['unique_id', 'ds', 'week_day']]
{% endraw %}

Returning model

{% raw %}
space = nhits_space(n_time_out=24) #, n_series=1, n_x=1, n_s=0, frequency='H')
space['max_steps'] = hp.choice('max_steps', [1]) # Override max_steps for faster example
# The suggested spaces are partial, here we complete them with data specific information
space['n_series']   = hp.choice('n_series', [ Y_df['unique_id'].nunique() ])
space['n_x']        = hp.choice('n_x', [ 0 if X_df is None else (X_df.shape[1]-2) ])
space['n_s']        = hp.choice('n_s', [ 0 if S_df is None else (S_df.shape[1]-1) ])
space['n_x_hidden'] = hp.choice('n_x_hidden', [ 0 if X_df is None else (X_df.shape[1]-2) ])
space['n_s_hidden'] = hp.choice('n_s_hidden', [ 0 if S_df is None else (S_df.shape[1]-1) ])
# Infers freq with first time series
freq = pd.infer_freq(Y_df[Y_df['unique_id']==Y_df.unique_id.unique()[0]]['ds']) 
space['frequency']  = hp.choice('frequency', [ freq ])

model, trials = hyperopt_tunning(space=space, hyperopt_max_evals=2, loss_function_val=mae,
                                 loss_functions_test={'mae': mae, 'rmse': rmse},
                                 S_df=S_df, Y_df=Y_df, X_df=X_df, f_cols=[],
                                 ds_in_val=7*24, ds_in_test=7*24, 
                                 return_forecasts=True, return_model=True, save_trials=True, 
                                 results_dir='./results/example', loss_kwargs={}, verbose=False)
INFO:hyperopt.tpe:build_posterior_wrapper took 0.013117 seconds
INFO:hyperopt.tpe:TPE using 0 trials
/Users/cchallu/opt/anaconda3/envs/neuralforecast/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
/Users/cchallu/opt/anaconda3/envs/neuralforecast/lib/python3.7/site-packages/torch/nn/functional.py:3635: UserWarning: Default upsampling behavior when mode=linear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode)
/Users/cchallu/opt/anaconda3/envs/neuralforecast/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
/Users/cchallu/opt/anaconda3/envs/neuralforecast/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py:133: UserWarning: The dataloader, predict_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  f"The dataloader, {name}, does not have many workers which may be a bottleneck."
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010917 seconds
INFO:hyperopt.tpe:TPE using 1/1 trials with best loss 8.598013
{% endraw %} {% raw %}
model
NHITS(
  (model): _NHITS(
    (blocks): ModuleList(
      (0): _NHITSBlock(
        (pooling_layer): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=True)
        (static_encoder): _StaticFeaturesEncoder(
          (encoder): Sequential(
            (0): Dropout(p=0.5, inplace=False)
            (1): Linear(in_features=1, out_features=1, bias=True)
            (2): ReLU()
          )
        )
        (layers): Sequential(
          (0): Linear(in_features=169, out_features=256, bias=True)
          (1): ReLU()
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): ReLU()
          (4): Linear(in_features=256, out_features=256, bias=True)
          (5): ReLU()
          (6): Linear(in_features=256, out_features=73, bias=True)
        )
        (basis): _IdentityBasis()
      )
      (1): _NHITSBlock(
        (pooling_layer): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=True)
        (static_encoder): _StaticFeaturesEncoder(
          (encoder): Sequential(
            (0): Dropout(p=0.5, inplace=False)
            (1): Linear(in_features=1, out_features=1, bias=True)
            (2): ReLU()
          )
        )
        (layers): Sequential(
          (0): Linear(in_features=169, out_features=256, bias=True)
          (1): ReLU()
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): ReLU()
          (4): Linear(in_features=256, out_features=256, bias=True)
          (5): ReLU()
          (6): Linear(in_features=256, out_features=73, bias=True)
        )
        (basis): _IdentityBasis()
      )
      (2): _NHITSBlock(
        (pooling_layer): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=True)
        (static_encoder): _StaticFeaturesEncoder(
          (encoder): Sequential(
            (0): Dropout(p=0.5, inplace=False)
            (1): Linear(in_features=1, out_features=1, bias=True)
            (2): ReLU()
          )
        )
        (layers): Sequential(
          (0): Linear(in_features=169, out_features=256, bias=True)
          (1): ReLU()
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): ReLU()
          (4): Linear(in_features=256, out_features=256, bias=True)
          (5): ReLU()
          (6): Linear(in_features=256, out_features=96, bias=True)
        )
        (basis): _IdentityBasis()
      )
    )
  )
)
{% endraw %}

Without returning model

{% raw %}
space = nbeats_space(n_time_out=24)
space['max_steps'] = hp.choice('max_steps', [1]) # Override max_steps for faster example
 # The suggested spaces are partial, here we complete them with data specific information
space['n_series']   = hp.choice('n_series', [ Y_df['unique_id'].nunique() ])
space['n_x']        = hp.choice('n_x', [ 0 if X_df is None else (X_df.shape[1]-2) ])
space['n_s']        = hp.choice('n_s', [ 0 if S_df is None else (S_df.shape[1]-1) ])
space['n_x_hidden'] = hp.choice('n_x_hidden', [ 0 if X_df is None else (X_df.shape[1]-2) ])
space['n_s_hidden'] = hp.choice('n_s_hidden', [ 0 if S_df is None else (S_df.shape[1]-1) ])
# Infers freq with first time series
freq = pd.infer_freq(Y_df[Y_df['unique_id']==Y_df.unique_id.unique()[0]]['ds']) 
space['frequency']  = hp.choice('frequency', [ freq ])
trials = hyperopt_tunning(space=space, hyperopt_max_evals=2, loss_function_val=mae,
                          loss_functions_test={'mae': mae, 'rmse': rmse},
                          S_df=S_df, Y_df=Y_df, X_df=X_df, f_cols=[],
                          ds_in_val=7*24, ds_in_test=7*24, 
                          return_forecasts=True, return_model=False, save_trials=False, 
                          results_dir=None, loss_kwargs={}, verbose=False)
INFO:hyperopt.tpe:build_posterior_wrapper took 0.010783 seconds
INFO:hyperopt.tpe:TPE using 0 trials
INFO:hyperopt.tpe:build_posterior_wrapper took 0.016312 seconds
INFO:hyperopt.tpe:TPE using 1/1 trials with best loss 8.788279
{% endraw %}