calibrax.metrics.wrappers¤
Decorator-pattern wrappers that enhance any metric function with additional
behavior. BootstrapMetric adds confidence interval estimation,
ClasswiseWrapper provides per-class breakdown, MetricTracker tracks
historical values with best-value detection, and MinMaxTracker maintains
running min/max/current state.
Metric wrappers: enhance any metric with confidence intervals, per-class breakdown, tracking.
Wrappers follow the decorator pattern -- they wrap any existing metric function and add additional behavior without modifying the original.
BootstrapMetric: Bootstrap confidence interval estimation.ClasswiseWrapper: Per-class metric breakdown.MetricTracker: Historical tracking with best-value detection.MinMaxTracker: Running min/max/current tracking.
BootstrapMetric(metric_fn: Callable[..., float], *, num_bootstraps: int = 1000, confidence: float = 0.95, seed: int = 0)
¤
Wrap any metric function with bootstrap confidence interval estimation.
A measurement without uncertainty is incomplete. This wrapper provides bootstrap-based confidence intervals for any metric.
Usage
bootstrap = BootstrapMetric(mse, num_bootstraps=1000, confidence=0.95) result = bootstrap.compute(predictions, targets)
{"value": 0.01, "lower": 0.008, "upper": 0.012, "samples": (...)}¤
Attributes:
| Name | Type | Description |
|---|---|---|
metric_fn |
Callable[..., float]
|
The wrapped metric function. |
num_bootstraps |
int
|
Number of bootstrap resamples. |
confidence |
float
|
Confidence level for interval. |
Initialize bootstrap wrapper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
Callable[..., float]
|
Pure function with signature (predictions, targets) -> float. |
required |
num_bootstraps
|
int
|
Number of bootstrap resamples. |
1000
|
confidence
|
float
|
Confidence level (0 < confidence < 1). |
0.95
|
seed
|
int
|
Random seed for reproducibility. |
0
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If confidence is not in (0, 1). |
metric_fn: Callable[..., float]
property
¤
Get the wrapped metric function.
num_bootstraps: int
property
¤
Get the number of bootstrap resamples.
confidence: float
property
¤
Get the confidence level.
compute(predictions: Any, targets: Any) -> dict[str, Any]
¤
Compute metric with bootstrap confidence interval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
Any
|
Predicted values. |
required |
targets
|
Any
|
Ground truth values. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with "value" (point estimate), "lower" (CI lower bound), |
dict[str, Any]
|
"upper" (CI upper bound), "samples" (all bootstrap values). |
ClasswiseWrapper(metric_fn: Callable[..., float], *, class_names: list[str] | None = None)
¤
Wrap any metric to compute it separately for each class.
Provides per-class breakdown of any (predictions, targets) -> float metric. Useful for identifying which classes a model performs poorly on.
Usage
classwise = ClasswiseWrapper(mse, class_names=["cat", "dog", "bird"]) result = classwise.compute(predictions, targets, labels)
{"cat": 0.01, "dog": 0.03, "bird": 0.02, "mean": 0.02}¤
Initialize classwise wrapper.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
Callable[..., float]
|
Pure function with signature (predictions, targets) -> float. |
required |
class_names
|
list[str] | None
|
Optional human-readable class names. If None, uses integer indices as keys. |
None
|
compute(predictions: Any, targets: Any, labels: Any) -> dict[str, float]
¤
Compute metric per class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
Any
|
Predicted values. |
required |
targets
|
Any
|
Ground truth values. |
required |
labels
|
Any
|
Class labels for grouping (integer array). |
required |
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
Dict mapping class names to metric values, plus "mean" key. |
MetricTracker(metric_fn: Callable[..., float], *, direction: str = 'lower')
¤
Track a metric's history across multiple evaluation epochs.
Maintains a history of metric values with automatic best-value detection based on direction (higher/lower is better).
Usage
tracker = MetricTracker(mse, direction="lower") tracker.increment(predictions_1, targets_1) tracker.increment(predictions_2, targets_2) print(tracker.best()) # Lowest MSE seen print(tracker.history) # (0.05, 0.03) print(tracker.best_epoch) # 1 (0-indexed)
Initialize metric tracker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
Callable[..., float]
|
Pure function with signature (predictions, targets) -> float. |
required |
direction
|
str
|
"lower" or "higher" -- determines what "best" means. |
'lower'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If direction is not "lower" or "higher". |
best_epoch: int
property
¤
Return the epoch index of the best metric value.
Raises:
| Type | Description |
|---|---|
ValueError
|
If no values have been tracked. |
history: tuple[float, ...]
property
¤
Return all tracked values as an immutable tuple.
increment(predictions: Any, targets: Any) -> float
¤
Compute metric and add to history.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
Any
|
Predicted values. |
required |
targets
|
Any
|
Ground truth values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The computed metric value. |
best() -> float
¤
Return the best metric value seen so far.
Returns:
| Type | Description |
|---|---|
float
|
Best value (min for "lower", max for "higher"). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no values have been tracked. |
reset() -> None
¤
Clear all tracked history.
MinMaxTracker(metric_fn: Callable[..., float])
¤
Track running min, max, and current value for any metric.
Useful for monitoring metric ranges during training without storing full history.
Usage
tracker = MinMaxTracker(mse) tracker.update(predictions, targets) print(tracker.current) # Latest value print(tracker.min) # Lowest seen print(tracker.max) # Highest seen
Initialize min/max tracker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
Callable[..., float]
|
Pure function with signature (predictions, targets) -> float. |
required |
current: float | None
property
¤
Return the most recently computed value.
min: float | None
property
¤
Return the minimum value seen.
max: float | None
property
¤
Return the maximum value seen.
update(predictions: Any, targets: Any) -> float
¤
Compute metric and update min/max tracking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
Any
|
Predicted values. |
required |
targets
|
Any
|
Ground truth values. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The computed metric value. |
reset() -> None
¤
Reset all tracking state.