calibrax.metrics.wrappers¤

Decorator-pattern wrappers that enhance any metric function with additional behavior. BootstrapMetric adds confidence interval estimation, ClasswiseWrapper provides per-class breakdown, MetricTracker tracks historical values with best-value detection, and MinMaxTracker maintains running min/max/current state.

Metric wrappers: enhance any metric with confidence intervals, per-class breakdown, tracking.

Wrappers follow the decorator pattern -- they wrap any existing metric function and add additional behavior without modifying the original.

BootstrapMetric: Bootstrap confidence interval estimation.
ClasswiseWrapper: Per-class metric breakdown.
MetricTracker: Historical tracking with best-value detection.
MinMaxTracker: Running min/max/current tracking.

`BootstrapMetric(metric_fn: Callable[..., float], *, num_bootstraps: int = 1000, confidence: float = 0.95, seed: int = 0)` ¤

Wrap any metric function with bootstrap confidence interval estimation.

A measurement without uncertainty is incomplete. This wrapper provides bootstrap-based confidence intervals for any metric.

Usage

bootstrap = BootstrapMetric(mse, num_bootstraps=1000, confidence=0.95) result = bootstrap.compute(predictions, targets)

{"value": 0.01, "lower": 0.008, "upper": 0.012, "samples": (...)}¤

Attributes:

Name	Type	Description
`metric_fn`	`Callable[..., float]`	The wrapped metric function.
`num_bootstraps`	`int`	Number of bootstrap resamples.
`confidence`	`float`	Confidence level for interval.

Initialize bootstrap wrapper.

Parameters:

Name	Type	Description	Default
`metric_fn`	`Callable[..., float]`	Pure function with signature (predictions, targets) -> float.	required
`num_bootstraps`	`int`	Number of bootstrap resamples.	`1000`
`confidence`	`float`	Confidence level (0 < confidence < 1).	`0.95`
`seed`	`int`	Random seed for reproducibility.	`0`

Raises:

Type	Description
`ValueError`	If confidence is not in (0, 1).

`metric_fn: Callable[..., float]` `property` ¤

Get the wrapped metric function.

`num_bootstraps: int` `property` ¤

Get the number of bootstrap resamples.

`confidence: float` `property` ¤

Get the confidence level.

`compute(predictions: Any, targets: Any) -> dict[str, Any]` ¤

Compute metric with bootstrap confidence interval.

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required

Returns:

Type	Description
`dict[str, Any]`	Dict with "value" (point estimate), "lower" (CI lower bound),
`dict[str, Any]`	"upper" (CI upper bound), "samples" (all bootstrap values).

`ClasswiseWrapper(metric_fn: Callable[..., float], *, class_names: list[str] | None = None)` ¤

Wrap any metric to compute it separately for each class.

Provides per-class breakdown of any (predictions, targets) -> float metric. Useful for identifying which classes a model performs poorly on.

Usage

classwise = ClasswiseWrapper(mse, class_names=["cat", "dog", "bird"]) result = classwise.compute(predictions, targets, labels)

{"cat": 0.01, "dog": 0.03, "bird": 0.02, "mean": 0.02}¤

Initialize classwise wrapper.

Parameters:

Name	Type	Description	Default
`metric_fn`	`Callable[..., float]`	Pure function with signature (predictions, targets) -> float.	required
`class_names`	`list[str] \| None`	Optional human-readable class names. If None, uses integer indices as keys.	`None`

`compute(predictions: Any, targets: Any, labels: Any) -> dict[str, float]` ¤

Compute metric per class.

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required
`labels`	`Any`	Class labels for grouping (integer array).	required

Returns:

Type	Description
`dict[str, float]`	Dict mapping class names to metric values, plus "mean" key.

`MetricTracker(metric_fn: Callable[..., float], *, direction: str = 'lower')` ¤

Track a metric's history across multiple evaluation epochs.

Maintains a history of metric values with automatic best-value detection based on direction (higher/lower is better).

Usage

tracker = MetricTracker(mse, direction="lower") tracker.increment(predictions_1, targets_1) tracker.increment(predictions_2, targets_2) print(tracker.best()) # Lowest MSE seen print(tracker.history) # (0.05, 0.03) print(tracker.best_epoch) # 1 (0-indexed)

Initialize metric tracker.

Parameters:

Name	Type	Description	Default
`metric_fn`	`Callable[..., float]`	Pure function with signature (predictions, targets) -> float.	required
`direction`	`str`	"lower" or "higher" -- determines what "best" means.	`'lower'`

Raises:

Type	Description
`ValueError`	If direction is not "lower" or "higher".

`best_epoch: int` `property` ¤

Return the epoch index of the best metric value.

Raises:

Type	Description
`ValueError`	If no values have been tracked.

`history: tuple[float, ...]` `property` ¤

Return all tracked values as an immutable tuple.

`increment(predictions: Any, targets: Any) -> float` ¤

Compute metric and add to history.

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required

Returns:

Type	Description
`float`	The computed metric value.

`best() -> float` ¤

Return the best metric value seen so far.

Returns:

Type	Description
`float`	Best value (min for "lower", max for "higher").

Raises:

Type	Description
`ValueError`	If no values have been tracked.

`reset() -> None` ¤

Clear all tracked history.

`MinMaxTracker(metric_fn: Callable[..., float])` ¤

Track running min, max, and current value for any metric.

Useful for monitoring metric ranges during training without storing full history.

Usage

tracker = MinMaxTracker(mse) tracker.update(predictions, targets) print(tracker.current) # Latest value print(tracker.min) # Lowest seen print(tracker.max) # Highest seen

Initialize min/max tracker.

Parameters:

Name	Type	Description	Default
`metric_fn`	`Callable[..., float]`	Pure function with signature (predictions, targets) -> float.	required

`current: float | None` `property` ¤

Return the most recently computed value.

`min: float | None` `property` ¤

Return the minimum value seen.

`max: float | None` `property` ¤

Return the maximum value seen.

`update(predictions: Any, targets: Any) -> float` ¤

Compute metric and update min/max tracking.

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required

Returns:

Type	Description
`float`	The computed metric value.

`reset() -> None` ¤

Reset all tracking state.

calibrax.metrics.wrappers¤

BootstrapMetric(metric_fn: Callable[..., float], *, num_bootstraps: int = 1000, confidence: float = 0.95, seed: int = 0) ¤

{"value": 0.01, "lower": 0.008, "upper": 0.012, "samples": (...)}¤

metric_fn: Callable[..., float] property ¤

num_bootstraps: int property ¤

confidence: float property ¤

compute(predictions: Any, targets: Any) -> dict[str, Any] ¤

ClasswiseWrapper(metric_fn: Callable[..., float], *, class_names: list[str] | None = None) ¤

{"cat": 0.01, "dog": 0.03, "bird": 0.02, "mean": 0.02}¤

compute(predictions: Any, targets: Any, labels: Any) -> dict[str, float] ¤

MetricTracker(metric_fn: Callable[..., float], *, direction: str = 'lower') ¤

best_epoch: int property ¤

history: tuple[float, ...] property ¤

increment(predictions: Any, targets: Any) -> float ¤

best() -> float ¤

reset() -> None ¤

MinMaxTracker(metric_fn: Callable[..., float]) ¤

current: float | None property ¤

min: float | None property ¤

max: float | None property ¤

update(predictions: Any, targets: Any) -> float ¤

reset() -> None ¤

`BootstrapMetric(metric_fn: Callable[..., float], *, num_bootstraps: int = 1000, confidence: float = 0.95, seed: int = 0)` ¤

`metric_fn: Callable[..., float]` `property` ¤

`num_bootstraps: int` `property` ¤

`confidence: float` `property` ¤

`compute(predictions: Any, targets: Any) -> dict[str, Any]` ¤

`ClasswiseWrapper(metric_fn: Callable[..., float], *, class_names: list[str] | None = None)` ¤

`compute(predictions: Any, targets: Any, labels: Any) -> dict[str, float]` ¤

`MetricTracker(metric_fn: Callable[..., float], *, direction: str = 'lower')` ¤

`best_epoch: int` `property` ¤

`history: tuple[float, ...]` `property` ¤

`increment(predictions: Any, targets: Any) -> float` ¤

`best() -> float` ¤

`reset() -> None` ¤

`MinMaxTracker(metric_fn: Callable[..., float])` ¤

`current: float | None` `property` ¤

`min: float | None` `property` ¤

`max: float | None` `property` ¤

`update(predictions: Any, targets: Any) -> float` ¤

`reset() -> None` ¤