Skip to content

Regression Metrics Deep Dive¤

Level Beginner
Time ~10 minutes
Prerequisites Quickstart
Format Python + Jupyter

Overview¤

Calibrax provides regression metrics that span common error measures, robust alternatives, percentage-based losses, and probabilistic ensemble scoring. This example computes the same-shape regression metrics on clean data, shows CRPS for ensemble forecasts, then demonstrates how outliers affect MSE, MAE, and Huber loss differently. It also demonstrates quantile loss at various levels and compares symmetric vs asymmetric percentage errors (SMAPE vs MAPE).

Understanding when each metric is appropriate is essential for model evaluation. Squared-error metrics amplify large deviations; robust alternatives like Huber and log-cosh provide smoother behaviour near outliers while remaining differentiable.

What You'll Learn¤

  1. Compute same-shape regression metrics on a single dataset
  2. Score ensemble forecasts with CRPS
  3. Compare outlier sensitivity across MSE, MAE, and Huber loss
  4. Use quantile loss to penalize under- vs over-prediction asymmetrically
  5. Distinguish SMAPE (symmetric) from MAPE (asymmetric) percentage errors
  6. Choose log-cosh as a twice-differentiable alternative to MAE

Files¤

Quick Start¤

source activate.sh && uv run python examples/metrics/02_regression_deep_dive.py

Key Concepts¤

Regression Metrics¤

Metric Description Outlier Sensitivity
mse Mean Squared Error High -- squares amplify large errors
mae Mean Absolute Error Low -- linear in error magnitude
rmse Root Mean Squared Error High -- same as MSE, in original units
r_squared Coefficient of determination High -- based on MSE
mape Mean Absolute Percentage Error Moderate -- relative to target magnitude
smape Symmetric MAPE Moderate -- symmetric under prediction/target swap
relative_error Mean relative error Moderate
explained_variance Variance of residuals vs targets High
max_error Worst-case absolute error Extreme -- driven by single worst point
huber_loss Quadratic near zero, linear far away Configurable via delta
quantile_loss Asymmetric loss for quantile regression Low
log_cosh_loss Smooth approximation to MAE Low
crps Continuous ranked probability score for ensemble forecasts Depends on ensemble spread
from calibrax.metrics.functional.regression import (
    crps, explained_variance, huber_loss, log_cosh_loss, mae, mape,
    max_error, mse, quantile_loss, r_squared, relative_error, rmse, smape,
)

CRPS uses an explicit ensemble-member dimension:

ensemble_predictions = jnp.array([[0.8, 1.0, 1.2], [1.8, 2.0, 2.2]])
ensemble_targets = jnp.array([1.0, 2.0])
crps(ensemble_predictions, ensemble_targets)

Outlier Sensitivity¤

When data contains outliers, squared-error metrics (MSE, RMSE) can be dominated by a single bad prediction. The example injects one outlier and compares the effect:

preds_clean = jnp.array([1.1, 2.1, 3.1, 4.1, 5.1])
preds_outlier = jnp.array([1.1, 2.1, 3.1, 4.1, 15.0])  # outlier at index 4

# MSE jumps dramatically; MAE grows linearly; Huber caps the contribution
mse(preds_outlier, targets)    # large increase
mae(preds_outlier, targets)    # moderate increase
huber_loss(preds_outlier, targets, delta=1.0)  # bounded increase

Huber loss transitions from quadratic (for errors smaller than delta) to linear (for errors larger than delta), providing a tunable trade-off.

Quantile Loss¤

Quantile loss penalizes under-prediction and over-prediction asymmetrically. At quantile q, under-prediction is penalized by factor q and over-prediction by factor 1-q.

# q=0.9: heavily penalizes under-prediction (useful for safety margins)
quantile_loss(predictions, targets, quantile=0.9)

# q=0.1: heavily penalizes over-prediction
quantile_loss(predictions, targets, quantile=0.1)

# q=0.5: equivalent to MAE (symmetric)
quantile_loss(predictions, targets, quantile=0.5)

SMAPE vs MAPE¤

MAPE is asymmetric: swapping predictions and targets changes the result. SMAPE normalizes by the average of prediction and target, producing a symmetric measure.

mape(predictions, targets)   # changes if you swap arguments
smape(predictions, targets)  # same value regardless of argument order

Log-Cosh: Smooth MAE Alternative¤

Log-cosh behaves like 0.5 * MSE for small errors and like MAE for large errors. Unlike MAE, it is twice-differentiable everywhere, which makes it well-suited for gradient-based optimisation.

# For small errors: log_cosh ≈ 0.5 * error^2
# For large errors: log_cosh ≈ |error| - log(2)
log_cosh_loss(predictions, targets)

Example Code¤

The script starts by computing same-shape metrics on clean data:

targets = jnp.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
predictions = jnp.array([1.1, 1.9, 3.2, 3.8, 5.1, 5.9, 7.3, 7.8])

metrics = {
    "MSE": mse(predictions, targets),
    "MAE": mae(predictions, targets),
    "RMSE": rmse(predictions, targets),
    "R-squared": r_squared(predictions, targets),
    "MAPE": mape(predictions, targets),
    "SMAPE": smape(predictions, targets),
    "Relative Error": relative_error(predictions, targets),
    "Explained Variance": explained_variance(predictions, targets),
    "Max Error": max_error(predictions, targets),
    "Huber Loss (delta=1.0)": huber_loss(predictions, targets, delta=1.0),
    "Quantile Loss (q=0.5)": quantile_loss(predictions, targets, quantile=0.5),
    "Log-Cosh Loss": log_cosh_loss(predictions, targets),
}

Next Steps¤