calibrax.metrics.composition¤

Composition framework for grouping and combining metrics. MetricCollection groups multiple metrics for batch computation, WeightedMetric produces a single weighted score, MetricSuite organizes metrics by domain, and ThresholdMetric wraps a metric with a pass/fail threshold for CI gates.

Metric composition: collections, weighted combinations, suites, thresholds.

Provides higher-level abstractions for grouping and combining metrics:

MetricCollection: Group multiple metrics, compute all in one call.
WeightedMetric: Weighted combination of metric values into a single score.
MetricSuite: Named groups of metrics with domain awareness.
ThresholdMetric: Wrap a metric with a pass/fail threshold for CI gates.

`MetricCollection(metrics: dict[str, Callable[..., float]])` ¤

Group multiple metrics, compute all in one call.

Supports Tier 0 pure functions via callable references.

Usage

collection = MetricCollection({ "mse": mse, "mae": mae, }) results = collection.compute_functional(predictions, targets)

{"mse": 0.01, "mae": 0.05}¤

Attributes:

Name	Type	Description
`metrics`		Dictionary mapping metric names to callables.

Initialize with a dictionary of named metric functions.

Parameters:

Name	Type	Description	Default
`metrics`	`dict[str, Callable[..., float]]`	Mapping of metric names to callable functions.	required

`names: list[str]` `property` ¤

Return all metric names in the collection.

`compute_functional(predictions: Any, targets: Any, **kwargs: Any) -> dict[str, float]` ¤

Compute all functional metrics.

Calls each callable metric with (predictions, targets, **kwargs).

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required
`**kwargs`	`Any`	Additional keyword arguments passed to each function.	`{}`

Returns:

Type	Description
`dict[str, float]`	Dictionary mapping metric names to computed float values.

`add(name: str, metric: Callable[..., float]) -> None` ¤

Add a metric to the collection.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name for the metric.	required
`metric`	`Callable[..., float]`	Callable metric function.	required

`remove(name: str) -> None` ¤

Remove a metric by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the metric to remove.	required

Raises:

Type	Description
`KeyError`	If metric name not found.

`from_registry(*, domain: str | None = None, tier: MetricTier = MetricTier.PURE_FUNCTION) -> MetricCollection` `classmethod` ¤

Create a collection from all registered metrics matching filters.

Parameters:

Name	Type	Description	Default
`domain`	`str \| None`	Filter by domain (None = all domains).	`None`
`tier`	`MetricTier`	Filter by tier (default: PURE_FUNCTION).	`PURE_FUNCTION`

Returns:

Type	Description
`MetricCollection`	MetricCollection with matching metrics.

`WeightedMetric(weights: dict[str, float])` ¤

Weighted combination of metric values into a single score.

Usage

weighted = WeightedMetric({"mse": 0.7, "mae": 0.3}) score = weighted.compute({"mse": 0.01, "mae": 0.05})

0.7 * 0.01 + 0.3 * 0.05 = 0.022¤

Attributes:

Name	Type	Description
`weights`	`dict[str, float]`	Dictionary mapping metric names to float weights.

Initialize with metric weights.

Parameters:

Name	Type	Description	Default
`weights`	`dict[str, float]`	Metric name to weight mapping. Weights need not sum to 1.	required

Raises:

Type	Description
`ValueError`	If weights dict is empty.

`weights: dict[str, float]` `property` ¤

Return the weights dictionary.

`normalized_weights: dict[str, float]` `property` ¤

Return weights normalized to sum to 1.0.

`compute(metric_values: dict[str, float]) -> float` ¤

Compute weighted sum of metric values.

Parameters:

Name	Type	Description	Default
`metric_values`	`dict[str, float]`	Dictionary of metric name to value.	required

Returns:

Type	Description
`float`	Weighted sum as a Python float.

Raises:

Type	Description
`KeyError`	If a required metric is missing from metric_values.

`MetricSuite()` ¤

Named groups of metrics with tier/domain awareness.

Organizes metrics into named groups for structured evaluation. Can auto-populate from the MetricRegistry.

Usage

suite = MetricSuite() suite.add_group("regression", ["mse", "mae", "rmse"]) suite.add_group("classification", ["accuracy", "f1_score"]) results = suite.compute_all(predictions, targets)

{"regression": {"mse": ..., "mae": ..., "rmse": ...},¤

"classification": {"accuracy": ..., "f1_score": ...}}¤

Attributes:

Name	Type	Description
`groups`		Dictionary mapping group names to metric name lists.

Initialize an empty metric suite.

`add_group(group_name: str, metric_names: list[str]) -> None` ¤

Add a named group of metrics.

Parameters:

Name	Type	Description	Default
`group_name`	`str`	Name for the group.	required
`metric_names`	`list[str]`	List of metric names (must be registered in MetricRegistry).	required

Raises:

Type	Description
`KeyError`	If any metric name is not in the registry.

`compute_all(predictions: Any, targets: Any) -> dict[str, dict[str, float]]` ¤

Compute all metrics in all groups.

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required

Returns:

Type	Description
`dict[str, dict[str, float]]`	Nested dict: {group_name: {metric_name: value}}.

`list_groups() -> list[str]` ¤

Return all group names.

`from_registry_domains() -> MetricSuite` `classmethod` ¤

Create a suite grouped by domain from the registry.

Returns:

Type	Description
`MetricSuite`	MetricSuite with one group per domain containing all
`MetricSuite`	Tier 0 metrics in that domain.

`ThresholdMetric(metric_name: str, *, min_value: float | None = None, max_value: float | None = None)` ¤

Wrap a metric with a pass/fail threshold.

Usage

threshold = ThresholdMetric("mse", max_value=0.01) result = threshold.evaluate(predictions, targets)

{"value": 0.005, "passed": True, "threshold": 0.01, "metric_name": "mse"}¤

Attributes:

Name	Type	Description
`metric_name`	`str`	Name of the metric to evaluate.
`min_value`	`float \| None`	Minimum acceptable value (for HIGHER metrics).
`max_value`	`float \| None`	Maximum acceptable value (for LOWER metrics).

Initialize threshold metric.

Parameters:

Name	Type	Description	Default
`metric_name`	`str`	Registered metric name.	required
`min_value`	`float \| None`	Minimum acceptable value (metric must be >= this).	`None`
`max_value`	`float \| None`	Maximum acceptable value (metric must be <= this).	`None`

Raises:

Type	Description
`ValueError`	If neither min_value nor max_value is provided.
`KeyError`	If metric_name is not in the registry.

`metric_name: str` `property` ¤

Get the metric name.

`min_value: float | None` `property` ¤

Get the minimum threshold value.

`max_value: float | None` `property` ¤

Get the maximum threshold value.

`evaluate(predictions: Any, targets: Any) -> dict[str, Any]` ¤

Compute the metric and check against threshold.

Parameters:

Name	Type	Description	Default
`predictions`	`Any`	Predicted values.	required
`targets`	`Any`	Ground truth values.	required

Returns:

Type	Description
`dict[str, Any]`	Dict with "value" (float), "passed" (bool), "threshold" (float),
`dict[str, Any]`	"metric_name" (str).

calibrax.metrics.composition¤

MetricCollection(metrics: dict[str, Callable[..., float]]) ¤

{"mse": 0.01, "mae": 0.05}¤

names: list[str] property ¤

compute_functional(predictions: Any, targets: Any, **kwargs: Any) -> dict[str, float] ¤

add(name: str, metric: Callable[..., float]) -> None ¤

remove(name: str) -> None ¤

from_registry(*, domain: str | None = None, tier: MetricTier = MetricTier.PURE_FUNCTION) -> MetricCollection classmethod ¤

WeightedMetric(weights: dict[str, float]) ¤

0.7 * 0.01 + 0.3 * 0.05 = 0.022¤

weights: dict[str, float] property ¤

normalized_weights: dict[str, float] property ¤

compute(metric_values: dict[str, float]) -> float ¤

MetricSuite() ¤

{"regression": {"mse": ..., "mae": ..., "rmse": ...},¤

"classification": {"accuracy": ..., "f1_score": ...}}¤

add_group(group_name: str, metric_names: list[str]) -> None ¤

compute_all(predictions: Any, targets: Any) -> dict[str, dict[str, float]] ¤

list_groups() -> list[str] ¤

from_registry_domains() -> MetricSuite classmethod ¤

ThresholdMetric(metric_name: str, *, min_value: float | None = None, max_value: float | None = None) ¤

{"value": 0.005, "passed": True, "threshold": 0.01, "metric_name": "mse"}¤

metric_name: str property ¤

min_value: float | None property ¤

max_value: float | None property ¤

evaluate(predictions: Any, targets: Any) -> dict[str, Any] ¤

`MetricCollection(metrics: dict[str, Callable[..., float]])` ¤

`names: list[str]` `property` ¤

`compute_functional(predictions: Any, targets: Any, **kwargs: Any) -> dict[str, float]` ¤

`add(name: str, metric: Callable[..., float]) -> None` ¤

`remove(name: str) -> None` ¤

`from_registry(*, domain: str | None = None, tier: MetricTier = MetricTier.PURE_FUNCTION) -> MetricCollection` `classmethod` ¤

`WeightedMetric(weights: dict[str, float])` ¤

`weights: dict[str, float]` `property` ¤

`normalized_weights: dict[str, float]` `property` ¤

`compute(metric_values: dict[str, float]) -> float` ¤

`MetricSuite()` ¤

`add_group(group_name: str, metric_names: list[str]) -> None` ¤

`compute_all(predictions: Any, targets: Any) -> dict[str, dict[str, float]]` ¤

`list_groups() -> list[str]` ¤

`from_registry_domains() -> MetricSuite` `classmethod` ¤

`ThresholdMetric(metric_name: str, *, min_value: float | None = None, max_value: float | None = None)` ¤

`metric_name: str` `property` ¤

`min_value: float | None` `property` ¤

`max_value: float | None` `property` ¤

`evaluate(predictions: Any, targets: Any) -> dict[str, Any]` ¤