--- title: Title keywords: fastai sidebar: home_sidebar nb_path: "nbs/experiments__auto.ipynb" ---
The forecasting task we selected is to predict the number of patients with influenza-like illnesses from the US CDC dataset, the dataset contains 7 target variables, and has 966 weeks of history.
We will be creating point forecasts with N-BEATS, N-HiTS and RNN models. The predictive features will be the autoregressive features. More information on the dataset can be found in the N-HiTS paper.
Table of Contents
#!pip install neuralforecast
#!pip install matplotlib
import matplotlib.pyplot as plt
from neuralforecast.data.datasets.long_horizon import LongHorizon
Y_df, _, _ = LongHorizon.load(directory='./', group='ILI')
Y_df.head()
n_series = len(Y_df.unique_id.unique())
n_time = len(Y_df.ds.unique()) # dataset is balanced
ts_in_test = 193
ts_in_val = 97
print('n_time', n_time)
print('n_series', n_series)
print('ts_in_test', ts_in_test)
print('ts_in_val', ts_in_val)
# 'AGE 5-24', 'ILITOTAL', 'NUM. OF PROVIDERS', 'OT']
y_plot = Y_df[Y_df.unique_id=='% WEIGHTED ILI'].y.values
x_plot = pd.to_datetime(Y_df[Y_df.unique_id=='% WEIGHTED ILI'].ds).values
plt.plot(x_plot, y_plot)
plt.axvline(x_plot[n_time-ts_in_val-ts_in_test], color='black', linestyle='-.')
plt.axvline(x_plot[n_time-ts_in_test], color='black', linestyle='-.')
plt.ylabel('Weighted ILI [ratio]')
plt.xlabel('Date')
plt.grid()
plt.show()
plt.close()
A temporal train-evaluation split procedure allows us to estimate the model’s generalization performance on future data unseen by the model. We use the train set to optimize the model parameters, and the validation and test sets to evaluate the accuracy of the model’s predictions.
In this case we set the space to None
, that implicitly uses the predefined model space, but the space can be specified as a dictionary following the conventions of the Hyperopt package.
config_dict = {'nbeats':
{'space': None, # Use default
'hyperopt_steps': 5,
'timeout': 60*1
},
'nhits':
{'space': None, # Use default
'hyperopt_steps': 5,
'timeout': 60*1
},
'rnn':
{'space': None, # Use default
'hyperopt_steps': 5,
'timeout': 60*1
}
}
A temporal train-validation-test (676,97,193) split procedure allows us to estimate the model’s generalization performance on future data unseen by the model. We use the train set to optimize the model parameters, and the validation and test sets to evaluate the accuracy of the model’s predictions.
forecast_horizon = 24
best_model, results = auto(config_dict=config_dict,
Y_df=Y_df, X_df=None, S_df=None,
loss_function_val=nf.losses.numpy.mae,
loss_functions_test={'mae':nf.losses.numpy.mae,
'mse':nf.losses.numpy.mse},
forecast_horizon=forecast_horizon, ts_in_val=ts_in_val, ts_in_test=ts_in_test,
return_forecasts=True, return_model=True,
test_auto=True,
verbose=False)
time = results['nbeats']['optimization_times']
losses = results['nbeats']['optimization_losses']
plt.plot(time, losses)
plt.xlabel('segs')
plt.ylabel('val loss')
Here we wrangle the numpy predictions to evaluate and plot the predictions.
y_hat_nhits = results['nhits']['y_hat']#.reshape(n_series, forecast_horizon, ts_in_test)
y_hat_nbeats = results['nbeats']['y_hat']#.reshape(n_series, forecast_horizon, ts_in_test)
y_hat_rnn = results['rnn']['y_hat']#.reshape(n_series, forecast_horizon, ts_in_test)
y_true = results['nbeats']['y_true']#.reshape(forecast_horizon,n_series, -1)
print('\n Original Shapes')
print('1. y_hat_nhits.shape', y_hat_nhits.shape)
print('1. y_hat_nbeats.shape', y_hat_nbeats.shape)
print('1. y_hat_rnn.shape', y_hat_rnn.shape)
print('1. y_true.shape', y_true.shape)
y_hat_nbeats = results['nbeats']['y_hat'].reshape((n_series,
170, forecast_horizon))
y_hat_nhits = results['nhits']['y_hat'].reshape((n_series,
170, forecast_horizon))
y_true = results['nbeats']['y_true'].reshape((n_series,
170, forecast_horizon))
print('\n Wrangled Shapes')
print('2. y_hat_nhits.shape', y_hat_nhits.shape)
print('2. y_hat_nbeats.shape', y_hat_nbeats.shape)
print('2. y_hat_rnn.shape', y_hat_rnn.shape)
print('2. y_true.shape', y_true.shape)
w_idx = 0
u_idx = 0
plt.plot(y_true[u_idx,w_idx,:], label='True Signal')
plt.plot(y_hat_nbeats[u_idx,w_idx,:], label='N-BEATS')
plt.plot(y_hat_nhits[u_idx,w_idx,:], label='N-HiTS')
#plt.plot(y_true[:,0,2], label='True')
#plt.plot(best_nbeats[::24,:].flatten(), label='N-BEATS')
#plt.plot(best_rnn[::24,:].flatten(), label='RNN')
plt.legend()
plt.show()
print('Y_df.unique_id.unique()', Y_df.unique_id.unique())
ver = Y_df[Y_df.unique_id=='% WEIGHTED ILI']
plt.plot(ver.y[n_time-193:n_time-193+24])
plt.ylabel('% WEIGHTED ILI')
plt.show()
best_model
# X_forecast_df = X_df[X_df['ds']<'2016-12-28']
# forecast_df = best_model.forecast(Y_df=Y_forecast_df, X_df=X_forecast_df, S_df=None, batch_size=2)
# forecast_df