--- title:
auto_arima
in Python and R (for the moment...).exponential smoothing
, croston
, sesonal naive
, random walk with drift
and tbs
.pmdarima
.Prophet
. numba
.Current Python alternatives for statistical models are slow and inaccurate. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast
includes an extensive battery of models that can efficiently fit thousands of time series.
We compared accuracy and speed against: pmdarima, Rob Hyndman's forecast package and Facebook's Prophet. We used the Daily
, Hourly
and Weekly
data from the M4 competition.
The following table summarizes the results. As can be seen, our auto_arima
is the best model in accuracy (measured by the MASE
loss) and time, even compared with the original implementation in R.
dataset | metric | nixtla | pmdarima [1] | auto_arima_r | prophet |
---|---|---|---|---|---|
M4-Daily | MASE | 3.26 | 3.35 | 4.46 | 14.26 |
M4-Daily | time | 1.41 | 27.61 | 1.81 | 514.33 |
M4-Hourly | MASE | 0.92 | --- | 1.02 | 1.78 |
M4-Hourly | time | 12.92 | --- | 23.95 | 17.27 |
M4-Weekly | MASE | 2.34 | 2.47 | 2.58 | 7.29 |
M4-Weekly | time | 0.42 | 2.92 | 0.22 | 19.82 |
[1] The model auto_arima
from pmdarima
had problems with Hourly data. An issue was opened in their repo.
The following table summarizes the data details.
group | n_series | mean_length | std_length | min_length | max_length |
---|---|---|---|---|---|
Daily | 4,227 | 2,371 | 1,756 | 107 | 9,933 |
Hourly | 414 | 901 | 127 | 748 | 1,008 |
Weekly | 359 | 1,035 | 707 | 93 | 2,610 |
We measured the computational time against the number of time series. The following graph shows the results. As we can see, the fastest model is our auto_arima
.
You can reproduce the results here.
import numpy as np
import pandas as pd
from IPython.display import display, Markdown
import matplotlib.pyplot as plt
from statsforecast import StatsForecast
from statsforecast.models import seasonal_naive, auto_arima
from statsforecast.utils import AirPassengers
horizon = 12
ap_train = AirPassengers[:-horizon]
ap_test = AirPassengers[-horizon:]
series_train = pd.DataFrame(
{
'ds': np.arange(1, ap_train.size + 1),
'y': ap_train
},
index=pd.Index([0] * ap_train.size, name='unique_id')
)
def display_df(df):
display(Markdown(df.to_markdown()))
fcst = StatsForecast(
series_train,
models=[(auto_arima, 12), (seasonal_naive, 12)],
freq='M',
n_jobs=1
)
forecasts = fcst.forecast(12)
display_df(forecasts)
forecasts['y_test'] = ap_test
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
pd.concat([series_train, forecasts]).set_index('ds').plot(ax=ax, linewidth=2)
ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(20)
series_xreg = pd.DataFrame(
{
'ds': pd.date_range(start='1949-01-01', periods=ap_train.size, freq='M'),
'y': ap_train
},
index=pd.Index([0] * ap_train.size, name='unique_id')
)
series_xreg['trend'] = np.arange(1, ap_train.size + 1)
series_xreg['intercept'] = np.ones(ap_train.size)
series_xreg['month'] = series_xreg['ds'].dt.month
series_xreg = pd.get_dummies(series_xreg, columns=['month'], drop_first=True)
display_df(series_xreg.head())
xreg_test = pd.DataFrame(
{
'ds': pd.date_range(start='1960-01-01', periods=ap_test.size, freq='M')
},
index=pd.Index([0] * ap_test.size, name='unique_id')
)
xreg_test['trend'] = np.arange(133, ap_test.size + 133)
xreg_test['intercept'] = np.ones(ap_test.size)
xreg_test['month'] = xreg_test['ds'].dt.month
xreg_test = pd.get_dummies(xreg_test, columns=['month'], drop_first=True)
fcst = StatsForecast(
series_xreg,
models=[(auto_arima, 12), (seasonal_naive, 12)],
freq='M',
n_jobs=1
)
forecasts = fcst.forecast(12, xreg=xreg_test)
display_df(forecasts)
forecasts['y_test'] = ap_test
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
pd.concat([series_xreg[['ds', 'y']], forecasts]).set_index('ds').plot(ax=ax, linewidth=2)
ax.set_title('AirPassengers Forecast using External Regressors', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(20)
See CONTRIBUTING.md.
auto_arima
model is based (translated) from the R implementation included in the forecast package developed by Rob Hyndman.