Regression Metrics Deep Dive¤


Level	Beginner
Time	~10 minutes
Prerequisites	Quickstart
Format	Python + Jupyter

Overview¤

Calibrax provides regression metrics that span common error measures, robust alternatives, percentage-based losses, and probabilistic ensemble scoring. This example computes the same-shape regression metrics on clean data, shows CRPS for ensemble forecasts, then demonstrates how outliers affect MSE, MAE, and Huber loss differently. It also demonstrates quantile loss at various levels and compares symmetric vs asymmetric percentage errors (SMAPE vs MAPE).

Understanding when each metric is appropriate is essential for model evaluation. Squared-error metrics amplify large deviations; robust alternatives like Huber and log-cosh provide smoother behaviour near outliers while remaining differentiable.

What You'll Learn¤

Compute same-shape regression metrics on a single dataset
Score ensemble forecasts with CRPS
Compare outlier sensitivity across MSE, MAE, and Huber loss
Use quantile loss to penalize under- vs over-prediction asymmetrically
Distinguish SMAPE (symmetric) from MAPE (asymmetric) percentage errors
Choose log-cosh as a twice-differentiable alternative to MAE

Files¤

Python Script: examples/metrics/02_regression_deep_dive.py
Jupyter Notebook: examples/metrics/02_regression_deep_dive.ipynb

Quick Start¤

source activate.sh && uv run python examples/metrics/02_regression_deep_dive.py

Key Concepts¤

Regression Metrics¤

Metric	Description	Outlier Sensitivity
`mse`	Mean Squared Error	High -- squares amplify large errors
`mae`	Mean Absolute Error	Low -- linear in error magnitude
`rmse`	Root Mean Squared Error	High -- same as MSE, in original units
`r_squared`	Coefficient of determination	High -- based on MSE
`mape`	Mean Absolute Percentage Error	Moderate -- relative to target magnitude
`smape`	Symmetric MAPE	Moderate -- symmetric under prediction/target swap
`relative_error`	Mean relative error	Moderate
`explained_variance`	Variance of residuals vs targets	High
`max_error`	Worst-case absolute error	Extreme -- driven by single worst point
`huber_loss`	Quadratic near zero, linear far away	Configurable via `delta`
`quantile_loss`	Asymmetric loss for quantile regression	Low
`log_cosh_loss`	Smooth approximation to MAE	Low
`crps`	Continuous ranked probability score for ensemble forecasts	Depends on ensemble spread

from calibrax.metrics.functional.regression import (
    crps, explained_variance, huber_loss, log_cosh_loss, mae, mape,
    max_error, mse, quantile_loss, r_squared, relative_error, rmse, smape,
)

CRPS uses an explicit ensemble-member dimension:

ensemble_predictions = jnp.array([[0.8, 1.0, 1.2], [1.8, 2.0, 2.2]])
ensemble_targets = jnp.array([1.0, 2.0])
crps(ensemble_predictions, ensemble_targets)

Outlier Sensitivity¤

When data contains outliers, squared-error metrics (MSE, RMSE) can be dominated by a single bad prediction. The example injects one outlier and compares the effect:

preds_clean = jnp.array([1.1, 2.1, 3.1, 4.1, 5.1])
preds_outlier = jnp.array([1.1, 2.1, 3.1, 4.1, 15.0])  # outlier at index 4

# MSE jumps dramatically; MAE grows linearly; Huber caps the contribution
mse(preds_outlier, targets)    # large increase
mae(preds_outlier, targets)    # moderate increase
huber_loss(preds_outlier, targets, delta=1.0)  # bounded increase

Huber loss transitions from quadratic (for errors smaller than delta) to linear (for errors larger than delta), providing a tunable trade-off.

Quantile Loss¤

Quantile loss penalizes under-prediction and over-prediction asymmetrically. At quantile q, under-prediction is penalized by factor q and over-prediction by factor 1-q.

# q=0.9: heavily penalizes under-prediction (useful for safety margins)
quantile_loss(predictions, targets, quantile=0.9)

# q=0.1: heavily penalizes over-prediction
quantile_loss(predictions, targets, quantile=0.1)

# q=0.5: equivalent to MAE (symmetric)
quantile_loss(predictions, targets, quantile=0.5)

SMAPE vs MAPE¤

MAPE is asymmetric: swapping predictions and targets changes the result. SMAPE normalizes by the average of prediction and target, producing a symmetric measure.

mape(predictions, targets)   # changes if you swap arguments
smape(predictions, targets)  # same value regardless of argument order

Log-Cosh: Smooth MAE Alternative¤

Log-cosh behaves like 0.5 * MSE for small errors and like MAE for large errors. Unlike MAE, it is twice-differentiable everywhere, which makes it well-suited for gradient-based optimisation.

# For small errors: log_cosh ≈ 0.5 * error^2
# For large errors: log_cosh ≈ |error| - log(2)
log_cosh_loss(predictions, targets)

Example Code¤

The script starts by computing same-shape metrics on clean data:

targets = jnp.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
predictions = jnp.array([1.1, 1.9, 3.2, 3.8, 5.1, 5.9, 7.3, 7.8])

metrics = {
    "MSE": mse(predictions, targets),
    "MAE": mae(predictions, targets),
    "RMSE": rmse(predictions, targets),
    "R-squared": r_squared(predictions, targets),
    "MAPE": mape(predictions, targets),
    "SMAPE": smape(predictions, targets),
    "Relative Error": relative_error(predictions, targets),
    "Explained Variance": explained_variance(predictions, targets),
    "Max Error": max_error(predictions, targets),
    "Huber Loss (delta=1.0)": huber_loss(predictions, targets, delta=1.0),
    "Quantile Loss (q=0.5)": quantile_loss(predictions, targets, quantile=0.5),
    "Log-Cosh Loss": log_cosh_loss(predictions, targets),
}

Next Steps¤

Classification Metrics -- binary classification, calibration, and segmentation
API Reference: calibrax.metrics.functional.regression -- full regression function signatures
Quickstart -- if you skipped the basics