Skip to content

calibrax.metrics.composition¤

Composition framework for grouping and combining metrics. MetricCollection groups multiple metrics for batch computation, WeightedMetric produces a single weighted score, MetricSuite organizes metrics by domain, and ThresholdMetric wraps a metric with a pass/fail threshold for CI gates.

Metric composition: collections, weighted combinations, suites, thresholds.

Provides higher-level abstractions for grouping and combining metrics:

  • MetricCollection: Group multiple metrics, compute all in one call.
  • WeightedMetric: Weighted combination of metric values into a single score.
  • MetricSuite: Named groups of metrics with domain awareness.
  • ThresholdMetric: Wrap a metric with a pass/fail threshold for CI gates.

MetricCollection(metrics: dict[str, Callable[..., float]]) ¤

Group multiple metrics, compute all in one call.

Supports Tier 0 pure functions via callable references.

Usage

collection = MetricCollection({ "mse": mse, "mae": mae, }) results = collection.compute_functional(predictions, targets)

{"mse": 0.01, "mae": 0.05}¤

Attributes:

Name Type Description
metrics

Dictionary mapping metric names to callables.

Initialize with a dictionary of named metric functions.

Parameters:

Name Type Description Default
metrics dict[str, Callable[..., float]]

Mapping of metric names to callable functions.

required

names: list[str] property ¤

Return all metric names in the collection.

compute_functional(predictions: Any, targets: Any, **kwargs: Any) -> dict[str, float] ¤

Compute all functional metrics.

Calls each callable metric with (predictions, targets, **kwargs).

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required
**kwargs Any

Additional keyword arguments passed to each function.

{}

Returns:

Type Description
dict[str, float]

Dictionary mapping metric names to computed float values.

add(name: str, metric: Callable[..., float]) -> None ¤

Add a metric to the collection.

Parameters:

Name Type Description Default
name str

Name for the metric.

required
metric Callable[..., float]

Callable metric function.

required

remove(name: str) -> None ¤

Remove a metric by name.

Parameters:

Name Type Description Default
name str

Name of the metric to remove.

required

Raises:

Type Description
KeyError

If metric name not found.

from_registry(*, domain: str | None = None, tier: MetricTier = MetricTier.PURE_FUNCTION) -> MetricCollection classmethod ¤

Create a collection from all registered metrics matching filters.

Parameters:

Name Type Description Default
domain str | None

Filter by domain (None = all domains).

None
tier MetricTier

Filter by tier (default: PURE_FUNCTION).

PURE_FUNCTION

Returns:

Type Description
MetricCollection

MetricCollection with matching metrics.

WeightedMetric(weights: dict[str, float]) ¤

Weighted combination of metric values into a single score.

Usage

weighted = WeightedMetric({"mse": 0.7, "mae": 0.3}) score = weighted.compute({"mse": 0.01, "mae": 0.05})

0.7 * 0.01 + 0.3 * 0.05 = 0.022¤

Attributes:

Name Type Description
weights dict[str, float]

Dictionary mapping metric names to float weights.

Initialize with metric weights.

Parameters:

Name Type Description Default
weights dict[str, float]

Metric name to weight mapping. Weights need not sum to 1.

required

Raises:

Type Description
ValueError

If weights dict is empty.

weights: dict[str, float] property ¤

Return the weights dictionary.

normalized_weights: dict[str, float] property ¤

Return weights normalized to sum to 1.0.

compute(metric_values: dict[str, float]) -> float ¤

Compute weighted sum of metric values.

Parameters:

Name Type Description Default
metric_values dict[str, float]

Dictionary of metric name to value.

required

Returns:

Type Description
float

Weighted sum as a Python float.

Raises:

Type Description
KeyError

If a required metric is missing from metric_values.

MetricSuite() ¤

Named groups of metrics with tier/domain awareness.

Organizes metrics into named groups for structured evaluation. Can auto-populate from the MetricRegistry.

Usage

suite = MetricSuite() suite.add_group("regression", ["mse", "mae", "rmse"]) suite.add_group("classification", ["accuracy", "f1_score"]) results = suite.compute_all(predictions, targets)

{"regression": {"mse": ..., "mae": ..., "rmse": ...},¤

"classification": {"accuracy": ..., "f1_score": ...}}¤

Attributes:

Name Type Description
groups

Dictionary mapping group names to metric name lists.

Initialize an empty metric suite.

add_group(group_name: str, metric_names: list[str]) -> None ¤

Add a named group of metrics.

Parameters:

Name Type Description Default
group_name str

Name for the group.

required
metric_names list[str]

List of metric names (must be registered in MetricRegistry).

required

Raises:

Type Description
KeyError

If any metric name is not in the registry.

compute_all(predictions: Any, targets: Any) -> dict[str, dict[str, float]] ¤

Compute all metrics in all groups.

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required

Returns:

Type Description
dict[str, dict[str, float]]

Nested dict: {group_name: {metric_name: value}}.

list_groups() -> list[str] ¤

Return all group names.

from_registry_domains() -> MetricSuite classmethod ¤

Create a suite grouped by domain from the registry.

Returns:

Type Description
MetricSuite

MetricSuite with one group per domain containing all

MetricSuite

Tier 0 metrics in that domain.

ThresholdMetric(metric_name: str, *, min_value: float | None = None, max_value: float | None = None) ¤

Wrap a metric with a pass/fail threshold.

Usage

threshold = ThresholdMetric("mse", max_value=0.01) result = threshold.evaluate(predictions, targets)

{"value": 0.005, "passed": True, "threshold": 0.01, "metric_name": "mse"}¤

Attributes:

Name Type Description
metric_name str

Name of the metric to evaluate.

min_value float | None

Minimum acceptable value (for HIGHER metrics).

max_value float | None

Maximum acceptable value (for LOWER metrics).

Initialize threshold metric.

Parameters:

Name Type Description Default
metric_name str

Registered metric name.

required
min_value float | None

Minimum acceptable value (metric must be >= this).

None
max_value float | None

Maximum acceptable value (metric must be <= this).

None

Raises:

Type Description
ValueError

If neither min_value nor max_value is provided.

KeyError

If metric_name is not in the registry.

metric_name: str property ¤

Get the metric name.

min_value: float | None property ¤

Get the minimum threshold value.

max_value: float | None property ¤

Get the maximum threshold value.

evaluate(predictions: Any, targets: Any) -> dict[str, Any] ¤

Compute the metric and check against threshold.

Parameters:

Name Type Description Default
predictions Any

Predicted values.

required
targets Any

Ground truth values.

required

Returns:

Type Description
dict[str, Any]

Dict with "value" (float), "passed" (bool), "threshold" (float),

dict[str, Any]

"metric_name" (str).