Skip to content

calibrax.metrics.wrappers¤

Decorator-pattern wrappers that enhance any metric function with additional behavior. BootstrapMetric adds confidence interval estimation, ClasswiseWrapper provides per-class breakdown, MetricTracker tracks historical values with best-value detection, and MinMaxTracker maintains running min/max/current state.

Metric wrappers: enhance any metric with confidence intervals, per-class breakdown, tracking.

Wrappers follow the decorator pattern -- they wrap any existing metric function and add additional behavior without modifying the original.

  • BootstrapMetric: Bootstrap confidence interval estimation.
  • ClasswiseWrapper: Per-class metric breakdown.
  • MetricTracker: Historical tracking with best-value detection.
  • MinMaxTracker: Running min/max/current tracking.

BootstrapMetric(metric_fn: Callable[..., float], *, num_bootstraps: int = 1000, confidence: float = 0.95, seed: int = 0) ¤

Wrap any metric function with bootstrap confidence interval estimation.

A measurement without uncertainty is incomplete. This wrapper provides bootstrap-based confidence intervals for any metric.

Usage

bootstrap = BootstrapMetric(mse, num_bootstraps=1000, confidence=0.95) result = bootstrap.compute(predictions, targets)

{"value": 0.01, "lower": 0.008, "upper": 0.012, "samples": (...)}¤

Attributes:

Name Type Description
metric_fn Callable[..., float]

The wrapped metric function.

num_bootstraps int

Number of bootstrap resamples.

confidence float

Confidence level for interval.

Initialize bootstrap wrapper.

Parameters:

Name Type Description Default
metric_fn Callable[..., float]

Pure function with signature (predictions, targets) -> float.

required
num_bootstraps int

Number of bootstrap resamples.

1000
confidence float

Confidence level (0 < confidence < 1).

0.95
seed int

Random seed for reproducibility.

0

Raises:

Type Description
ValueError

If confidence is not in (0, 1).

metric_fn: Callable[..., float] property ¤

Get the wrapped metric function.

num_bootstraps: int property ¤

Get the number of bootstrap resamples.

confidence: float property ¤

Get the confidence level.

compute(predictions: Any, targets: Any) -> dict[str, Any] ¤

Compute metric with bootstrap confidence interval.

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required

Returns:

Type Description
dict[str, Any]

Dict with "value" (point estimate), "lower" (CI lower bound),

dict[str, Any]

"upper" (CI upper bound), "samples" (all bootstrap values).

ClasswiseWrapper(metric_fn: Callable[..., float], *, class_names: list[str] | None = None) ¤

Wrap any metric to compute it separately for each class.

Provides per-class breakdown of any (predictions, targets) -> float metric. Useful for identifying which classes a model performs poorly on.

Usage

classwise = ClasswiseWrapper(mse, class_names=["cat", "dog", "bird"]) result = classwise.compute(predictions, targets, labels)

{"cat": 0.01, "dog": 0.03, "bird": 0.02, "mean": 0.02}¤

Initialize classwise wrapper.

Parameters:

Name Type Description Default
metric_fn Callable[..., float]

Pure function with signature (predictions, targets) -> float.

required
class_names list[str] | None

Optional human-readable class names. If None, uses integer indices as keys.

None

compute(predictions: Any, targets: Any, labels: Any) -> dict[str, float] ¤

Compute metric per class.

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required
labels Any

Class labels for grouping (integer array).

required

Returns:

Type Description
dict[str, float]

Dict mapping class names to metric values, plus "mean" key.

MetricTracker(metric_fn: Callable[..., float], *, direction: str = 'lower') ¤

Track a metric's history across multiple evaluation epochs.

Maintains a history of metric values with automatic best-value detection based on direction (higher/lower is better).

Usage

tracker = MetricTracker(mse, direction="lower") tracker.increment(predictions_1, targets_1) tracker.increment(predictions_2, targets_2) print(tracker.best()) # Lowest MSE seen print(tracker.history) # (0.05, 0.03) print(tracker.best_epoch) # 1 (0-indexed)

Initialize metric tracker.

Parameters:

Name Type Description Default
metric_fn Callable[..., float]

Pure function with signature (predictions, targets) -> float.

required
direction str

"lower" or "higher" -- determines what "best" means.

'lower'

Raises:

Type Description
ValueError

If direction is not "lower" or "higher".

best_epoch: int property ¤

Return the epoch index of the best metric value.

Raises:

Type Description
ValueError

If no values have been tracked.

history: tuple[float, ...] property ¤

Return all tracked values as an immutable tuple.

increment(predictions: Any, targets: Any) -> float ¤

Compute metric and add to history.

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required

Returns:

Type Description
float

The computed metric value.

best() -> float ¤

Return the best metric value seen so far.

Returns:

Type Description
float

Best value (min for "lower", max for "higher").

Raises:

Type Description
ValueError

If no values have been tracked.

reset() -> None ¤

Clear all tracked history.

MinMaxTracker(metric_fn: Callable[..., float]) ¤

Track running min, max, and current value for any metric.

Useful for monitoring metric ranges during training without storing full history.

Usage

tracker = MinMaxTracker(mse) tracker.update(predictions, targets) print(tracker.current) # Latest value print(tracker.min) # Lowest seen print(tracker.max) # Highest seen

Initialize min/max tracker.

Parameters:

Name Type Description Default
metric_fn Callable[..., float]

Pure function with signature (predictions, targets) -> float.

required

current: float | None property ¤

Return the most recently computed value.

min: float | None property ¤

Return the minimum value seen.

max: float | None property ¤

Return the maximum value seen.

update(predictions: Any, targets: Any) -> float ¤

Compute metric and update min/max tracking.

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required

Returns:

Type Description
float

The computed metric value.

reset() -> None ¤

Reset all tracking state.