--- title: PyTorch Losses keywords: fastai sidebar: home_sidebar summary: "The most important train signal is the forecast error, which is the difference between the observed value $y_{\\tau}$ and the prediction $\\hat{y}_{\\tau}$, at time $\\tau$: $$e_{\\tau} = y_{\\tau}-\\hat{y}_{\\tau} \\qquad \\qquad \\tau \\in \\{t+1,\\dots,t+H \\}.$$The train loss summarizes the forecast errors in different train objectives:

1. Scale-dependent errors - These metrics are on the same scale as the data.
2. Percentage errors - These metrics are unit-free, suitable for comparisons across series. .
3. Scale-independent errors - These metrics measure the relative improvements versus baselines, the available metric is
4. Probabilistic errors - These measure absolute deviation non-symmetrically, that produce under/over estimation. .
5. Other errors - Aditionally, two loss functions related to the M4 competition winner, ESRNN. " description: "The most important train signal is the forecast error, which is the difference between the observed value $y_{\\tau}$ and the prediction $\\hat{y}_{\\tau}$, at time $\\tau$: $$e_{\\tau} = y_{\\tau}-\\hat{y}_{\\tau} \\qquad \\qquad \\tau \\in \\{t+1,\\dots,t+H \\}.$$The train loss summarizes the forecast errors in different train objectives:

1. Scale-dependent errors - These metrics are on the same scale as the data.
2. Percentage errors - These metrics are unit-free, suitable for comparisons across series. .
3. Scale-independent errors - These metrics measure the relative improvements versus baselines, the available metric is
4. Probabilistic errors - These measure absolute deviation non-symmetrically, that produce under/over estimation. .
5. Other errors - Aditionally, two loss functions related to the M4 competition winner, ESRNN. " nb_path: "nbs/losses__pytorch.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

1. Scale-dependent Errors

Mean Absolute Error

{% raw %}

MAELoss[source]

MAELoss(y:Tensor, y_hat:Tensor, mask:Tensor=None)

Calculates Mean Absolute Error (MAE) between y and y_hat. MAE measures the relative prediction accuracy of a forecasting method by calculating the deviation of the prediction and the true value at a given time and averages these devations over the length of the series.

$$ \mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} |y_{\tau} - \hat{y}_{\tau}| $$
Parameters
----------
y: tensor (batch_size, output_size).
    Aactual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.

Returns
-------
mae: tensor (single value).
    Mean absolute error.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

Mean Squared Error

{% raw %}

MSELoss[source]

MSELoss(y:Tensor, y_hat:Tensor, mask:Tensor=None)

Calculates Mean Squared Error (MSE) between y and y_hat. MSE measures the relative prediction accuracy of a forecasting method by calculating the squared deviation of the prediction and the true value at a given time, and averages these devations over the length of the series.

$$ \mathrm{MSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2} $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.

Returns
-------
mse: tensor (single value).
    Mean Squared Error.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

Root Mean Squared Error

{% raw %}

RMSELoss[source]

RMSELoss(y:Tensor, y_hat:Tensor, mask:Tensor=None)

Calculates Root Mean Squared Error (RMSE) between y and y_hat. RMSE measures the relative prediction accuracy of a forecasting method by calculating the squared deviation of the prediction and the observed value at a given time and averages these devations over the length of the series. Finally the RMSE will be in the same scale as the original time series so its comparison with other series is possible only if they share a common scale. RMSE has a direct connection to the L2 norm.

$$ \mathrm{RMSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \sqrt{\frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2}} $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.

Returns
-------
rmse: tensor (single value).
    Root Mean Squared Error.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

2. Percentage Errors

Mean Absolute Percentage Error

{% raw %}

MAPELoss[source]

MAPELoss(y:Tensor, y_hat:Tensor, mask:Tensor=None)

Calculates Mean Absolute Percentage Error (MAPE) between y and y_hat. MAPE measures the relative prediction accuracy of a forecasting method by calculating the percentual deviation of the prediction and the observed value at a given time and averages these devations over the length of the series. The closer to zero an observed value is, the higher penalty MAPE loss assigns to the corresponding error.

$$ \mathrm{MAPE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|} $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.

Returns
-------
mape: tensor (single value).
    Mean absolute percentage error.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

Symmetric Mean Absolute Percentage Error

{% raw %}

SMAPELoss[source]

SMAPELoss(y:Tensor, y_hat:Tensor, mask:Tensor=None)

Calculates Symmetric Mean Absolute Percentage Error (SMAPE) between y and y_hat. SMAPE measures the relative prediction accuracy of a forecasting method by calculating the relative deviation of the prediction and the observed value scaled by the sum of the absolute values for the prediction and observed value at a given time, then averages these devations over the length of the series. This allows the SMAPE to have bounds between 0% and 200% which is desireble compared to normal MAPE that may be undetermined when the target is zero.

$$ \mathrm{sMAPE}_{2}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|+|\hat{y}_{\tau}|} $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.

Returns
-------
smape: tensor (single value).
    Symmetric mean absolute percentage error.
{% endraw %} {% raw %}
{% endraw %}

3. Scale-independent Errors

Mean Absolute Scaled Error

{% raw %}

MASELoss[source]

MASELoss(y:Tensor, y_hat:Tensor, y_insample:Tensor, seasonality:int, mask:Tensor=None)

Calculates the Mean Absolute Scaled Error (MASE) between y and y_hat. MASE measures the relative prediction accuracy of a forecasting method by comparinng the mean absolute errors of the prediction and the observed value against the mean absolute errors of the seasonal naive model. The MASE partially composed the Overall Weighted Average (OWA), used in the M4 Competition.

$$ \mathrm{MASE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})} $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
y_insample: tensor (batch_size, input_size).
    Actual insample Seasonal Naive predictions.
seasonality: int.
    Main frequency of the time series;
    Hourly 24,  Daily 7, Weekly 52,
    Monthly 12, Quarterly 4, Yearly 1.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.

Returns
-------
mase: tensor (single value).
    Mean absolute scaled error.

References
----------
[1] https://robjhyndman.com/papers/mase.pdf
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

4. Probabilistic Errors

Quantile Loss

{% raw %}

QuantileLoss[source]

QuantileLoss(y:Tensor, y_hat:Tensor, mask:Tensor=None, q:float=0.5)

Computes the quantile loss (QL) between y and y_hat. QL measures the deviation of a quantile forecast. By weighting the absolute deviation in a non symmetric way, the loss pays more attention to under or over estimation. A common value for q is 0.5 for the deviation from the median (Pinball loss).

$$ \mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \Big( (1-q)\,( \hat{y}^{(q)}_{\tau} - y_{\tau} )_{+} + q\,( y_{\tau} - \hat{y}^{(q)}_{\tau} )_{+} \Big) $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie
    to consider in loss.
q: float, between 0 and 1.
    The slope of the quantile loss, in the context of
    quantile regression, the q determines the conditional
    quantile level.

Returns
-------
quantile_loss: tensor (single value).
    Average quantile loss.
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

Multi-Quantile Loss

{% raw %}

MQLoss[source]

MQLoss(y:Tensor, y_hat:Tensor, quantiles:Tensor, mask:Tensor=None)

Calculates the Multi-Quantile loss (MQL) between y and y_hat. MQL calculates the average multi-quantile Loss for a given set of quantiles, based on the absolute difference between predicted quantiles and observed values.

$$ \mathrm{MQL}(\mathbf{y}_{\tau}, [\mathbf{\hat{y}}^{(q_{1})}_{\tau}, ... ,\hat{y}^{(q_{n})}_{\tau}]) = \frac{1}{n} \sum_{q_{i}} \mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q_{i})}_{\tau}) $$

The limit behavior of MQL allows to measure the accuracy of a full predictive distribution $\mathbf{\hat{F}}_{\tau}$ with the continuous ranked probability score (CRPS). This can be achieved through a numerical integration technique, that discretizes the quantiles and treats the CRPS integral with a left Riemann approximation, averaging over uniformly distanced quantiles.

$$ \mathrm{CRPS}(y_{\tau}, \mathbf{\hat{F}}_{\tau}) = \int^{1}_{0} \mathrm{QL}(y_{\tau}, \hat{y}^{(q)}_{\tau}) dq $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie to consider in loss.
quantiles: tensor(n_quantiles).
    Quantiles to estimate from the distribution of y.

Returns
-------
mqloss: tensor(n_quantiles).
    Average multi-quantile loss.

References
----------
[1] https://www.jstor.org/stable/2629907
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

Weighted Multi-Quantile Loss

{% raw %}

wMQLoss[source]

wMQLoss(y:Tensor, y_hat:Tensor, quantiles:Tensor, mask:Tensor=None)

Calculates the Weighted Multi-Quantile loss (WMQL) between y and y_hat. WMQL calculates the weighted average multi-quantile Loss for a given set of quantiles, based on the absolute difference between predicted quantiles and observed values.

$$ \mathrm{WMQL}(\mathbf{y}_{\tau}, [\mathbf{\hat{y}}^{(q_{1})}_{\tau}, ... ,\hat{y}^{(q_{n})}_{\tau}]) = \frac{1}{n} \sum_{q_{i}} \frac{\mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q_{i})}_{\tau})} {\sum^{t+H}_{\tau=t+1} |y_{\tau}|} $$
Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie to consider in loss.
quantiles: tensor(n_quantiles).
    Quantiles to estimate from the distribution of y.

Returns
-------
wmqloss: tensor(n_quantiles).
    Weighted average multi-quantile loss.
{% endraw %} {% raw %}
{% endraw %}

5. Other Errors

ES-RNN PyTorch Loss

The M4 competition winner, the exponential smoothing recurrent neural network combines the Holt-Winter method for the seasonal and levels with a dilated Recurrent neural network is defined by:

  • $\text{Level:} \quad l_{\tau} = \text{median}(y_{\tau})$
  • $\text{Residual:} \quad z_{\tau} = y_{\tau} - l_{\tau}$
  • $\text{NN Forecast:} \quad \hat{z}_{\tau} = \text{DRNN}(z_{\tau}, x_{\tau}, s_{\tau})$
  • $\text{Level Forecast:} \quad \hat{l}_{\tau} = \text{Naive}(l_{\tau})$
  • $\text{Forecast:} \quad \hat{y}_{\tau+H} = \hat{l}_{\tau+H} + \hat{z}_{\tau+h}$
{% raw %}

LevelVariabilityLoss[source]

LevelVariabilityLoss(levels:Tensor, level_variability_penalty:float)

Computes the variability penalty for the level of the ES-RNN. The levels of the ES-RNN are based on the Holt-Winters model.

$$ Penalty = \lambda * (\hat{l}_{τ+1}-\hat{l}_{τ})^{2} $$
Parameters
----------
levels: tensor with shape (batch, n_time).
    Levels obtained from exponential smoothing component of ESRNN.
level_variability_penalty: float.
    This parameter controls the strength of the penalization
    to the wigglines of the level vector, induces smoothness
    in the output.

Returns
----------
level_var_loss: tensor (single value).
    Wiggliness loss for the level vector.
{% endraw %} {% raw %}
{% endraw %} {% raw %}

SmylLoss[source]

SmylLoss(y:Tensor, y_hat:Tensor, levels:Tensor, mask:Tensor, tau:float, level_variability_penalty:float=0.0)

Computes the Smyl Loss that combines level variability regularization with with Quantile loss.

Parameters
----------
y: tensor (batch_size, output_size).
    Actual values in torch tensor.
y_hat: tensor (batch_size, output_size).
    Predicted values in torch tensor.
levels: tensor with shape (batch, n_time).
    Levels obtained from exponential smoothing component of ESRNN.
mask: tensor (batch_size, output_size).
    Specifies date stamps per serie to consider in loss.
tau: float, between 0 and 1.
    The slope of the quantile loss, in the context of
    quantile regression, the q determines the conditional
    quantile level.
level_variability_penalty: float.
    This parameter controls the strength of the penalization
    to the wigglines of the level vector, induces smoothness
    in the output.

Returns
----------
smyl_loss: tensor (single value).
    Smyl loss.
{% endraw %} {% raw %}
{% endraw %}