calibrax.analysis¤

Analysis tools for benchmark data: direction-aware regression detection, single- and multi-metric ranking, cross-configuration comparison reports, power-law scaling fits, and Pareto front computation.

Regression Detection¤

`calibrax.analysis.regression` ¤

Regression detection for benchmark runs.

Compares a current run against a baseline to flag metrics that degraded beyond a specified threshold.

`detect_regressions(run, baseline, threshold=0.05)` ¤

Flag metrics that degraded beyond threshold.

Uses MetricDef.direction: 'higher' metrics regress when they decrease, 'lower' metrics regress when they increase. 'info' metrics are skipped.

Parameters:

Name	Type	Description	Default
`run`	`Run`	Current benchmark run.	required
`baseline`	`Run`	Baseline run to compare against.	required
`threshold`	`float`	Relative change threshold (e.g. 0.05 = 5%).	`0.05`

Returns:

Type	Description
`list[Regression]`	List of detected regressions.

Ranking¤

`calibrax.analysis.ranking` ¤

Ranking and aggregate scoring for benchmark runs.

Ranks entries by metric value and computes weighted aggregate scores across multiple metrics.

`rank_table(run, metric, group_by_tag='framework')` ¤

Rank entries by metric value, grouped by a tag.

Uses MetricDef.direction for determining best-is-highest vs best-is-lowest.

Parameters:

Name	Type	Description	Default
`run`	`Run`	Benchmark run with points and metric_defs.	required
`metric`	`str`	Metric name to rank by.	required
`group_by_tag`	`str`	Tag key used to group points (default "framework").	`'framework'`

Returns:

Type	Description
`list[RankEntry]`	Sorted list of RankEntry, rank 1 = best.

`aggregate_score(run, weights)` ¤

Weighted aggregate score across metrics.

Normalizes each metric to [0, 1] range (best = 1.0, worst = 0.0), then computes a weighted sum. Uses MetricDef.direction for normalization.

Parameters:

Name	Type	Description	Default
`run`	`Run`	Benchmark run with points and metric_defs.	required
`weights`	`dict[str, float]`	{metric_name: weight} — weights are normalized to sum to 1.0.	required

Returns:

Type	Description
`dict[str, float]`	{framework_label: aggregate_score} where score is in [0, 1].

Comparison¤

`calibrax.analysis.comparison` ¤

Multi-configuration benchmark comparison.

Compares benchmark runs across different configurations (frameworks, hardware, etc.) using MetricDef-aware direction logic and aggregate scoring.

`MetricComparison(*, metric_name, values, rankings, best_label, improvement_factors)` `dataclass` ¤

Comparison results for a single metric across configurations.

Attributes:

Name	Type	Description
`metric_name`	`str`	Name of the compared metric.
`values`	`dict[str, float]`	Mapping of configuration label to metric value.
`rankings`	`tuple[RankEntry, ...]`	Ranked entries for this metric.
`best_label`	`str`	Label of the best-performing configuration.
`improvement_factors`	`dict[str, float]`	How much better the best is vs each config.

`to_dict()` ¤

Serialize to a JSON-compatible dictionary.

`ComparisonReport(*, name, labels_compared, metric_comparisons, winner_by_metric, overall_winner)` `dataclass` ¤

Full comparison across multiple metrics and configurations.

Attributes:

Name	Type	Description
`name`	`str`	Name of this comparison.
`labels_compared`	`tuple[str, ...]`	Configuration labels included.
`metric_comparisons`	`tuple[MetricComparison, ...]`	Per-metric comparison results.
`winner_by_metric`	`dict[str, str]`	Best label for each metric.
`overall_winner`	`str`	Best label by aggregate score.

`to_dict()` ¤

Serialize to a JSON-compatible dictionary.

`from_dict(data)` `classmethod` ¤

Deserialize from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with comparison report fields.	required

Returns:

Type	Description
`ComparisonReport`	Reconstructed ComparisonReport instance.

`compare_configurations(runs, metrics=None, *, group_by_tag='framework')` ¤

Compare benchmark runs across different configurations.

Builds a merged Run from all provided runs, using configuration labels as framework tags, then leverages rank_table and aggregate_score.

Parameters:

Name	Type	Description	Default
`runs`	`dict[str, Run]`	Mapping of configuration label to benchmark Run.	required
`metrics`	`Sequence[str] \| None`	Subset of metric names to compare. Defaults to all metrics found across all runs.	`None`
`group_by_tag`	`str`	Tag key used for grouping (default "framework").	`'framework'`

Returns:

Type	Description
`ComparisonReport`	ComparisonReport with per-metric comparisons and overall winner.

Raises:

Type	Description
`ValueError`	If fewer than 2 configurations are provided.

Scaling Laws¤

`calibrax.analysis.scaling` ¤

Scaling law fitting via log-linear regression.

Fits power-law relationships (value = a * size^b) using pure Python log-linear regression. No external dependencies required.

`scaling_fit(sizes, values)` ¤

Fit power-law: value = a * size^b using log-linear regression.

Takes log of both sides: log(value) = log(a) + b * log(size), then fits a linear regression. Pure Python (no scipy/numpy needed).

Parameters:

Name	Type	Description	Default
`sizes`	`list[float]`	Input sizes (e.g., batch sizes, dataset sizes).	required
`values`	`list[float]`	Measured values (e.g., throughput, latency).	required

Returns:

Type	Description
`ScalingLaw`	ScalingLaw with coefficient (a), exponent (b), r_squared, and
`ScalingLaw`	complexity classification string.

Raises:

Type	Description
`ValueError`	If inputs are empty or have different lengths.

Pareto Front¤

`calibrax.analysis.pareto` ¤

Pareto front identification for multi-objective benchmark analysis.

Identifies Pareto-optimal points for two metrics, respecting MetricDef.direction for dominance checks.

`pareto_front(points, x_metric, y_metric, *, metric_defs=None)` ¤

Identify Pareto-optimal points for two metrics.

A point is Pareto-optimal if no other point is strictly better on both metrics. Uses MetricDef.direction to determine "better".

Parameters:

Name	Type	Description	Default
`points`	`list[Point]`	List of benchmark points to analyze.	required
`x_metric`	`str`	First metric name.	required
`y_metric`	`str`	Second metric name.	required
`metric_defs`	`dict[str, MetricDef] \| None`	Optional metric definitions for direction. If not provided, defaults to higher-is-better for both metrics.	`None`

Returns:

Type	Description
`list[Point]`	List of Pareto-optimal points (subset of input, same order).

Change Point Detection¤

Optional Dependency

Requires ruptures: uv pip install "calibrax[changepoint]"

`calibrax.analysis.changepoint` ¤

Change point detection for benchmark time series.

Uses the ruptures library to detect significant changes in metric trends, enabling automated identification of performance regressions or improvements over time. Requires the optional ruptures dependency (uv pip install "calibrax[changepoint]").

`ChangePoint(*, index, timestamp=None, run_id=None, magnitude=0.0)` `dataclass` ¤

A detected change point in a benchmark trend series.

Attributes:

Name	Type	Description
`index`	`int`	Index in the trend series where the change was detected.
`timestamp`	`datetime \| None`	Timestamp of the change point, if available.
`run_id`	`str \| None`	Run ID at the change point, if available.
`magnitude`	`float`	Absolute difference in mean values before/after the change.

`to_dict()` ¤

Serialize to a JSON-compatible dictionary.

`from_dict(data)` `classmethod` ¤

Deserialize from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with change point fields.	required

Returns:

Type	Description
`ChangePoint`	Reconstructed ChangePoint instance.

`detect_change_points(trend, *, method='pelt', min_size=3, penalty=None)` ¤

Detect change points in a benchmark trend series.

Uses the ruptures library for change point detection with configurable algorithms.

Parameters:

Name	Type	Description	Default
`trend`	`TrendSeries`	TrendSeries containing the metric values over time.	required
`method`	`str`	Detection method ("pelt", "binseg", or "window").	`'pelt'`
`min_size`	`int`	Minimum segment size between change points.	`3`
`penalty`	`float \| None`	Penalty value for PELT/BinSeg. Auto-calibrated if None.	`None`

Returns:

Type	Description
`list[ChangePoint]`	List of detected ChangePoint instances, ordered by index.

Raises:

Type	Description
`ImportError`	If ruptures is not installed.
`ValueError`	If the trend has fewer points than min_size.

calibrax.analysis¤

Regression Detection¤

calibrax.analysis.regression ¤

detect_regressions(run, baseline, threshold=0.05) ¤

Ranking¤

calibrax.analysis.ranking ¤

rank_table(run, metric, group_by_tag='framework') ¤

aggregate_score(run, weights) ¤

Comparison¤

calibrax.analysis.comparison ¤

MetricComparison(*, metric_name, values, rankings, best_label, improvement_factors) dataclass ¤

to_dict() ¤

ComparisonReport(*, name, labels_compared, metric_comparisons, winner_by_metric, overall_winner) dataclass ¤

to_dict() ¤

from_dict(data) classmethod ¤

compare_configurations(runs, metrics=None, *, group_by_tag='framework') ¤

Scaling Laws¤

calibrax.analysis.scaling ¤

scaling_fit(sizes, values) ¤

Pareto Front¤

calibrax.analysis.pareto ¤

pareto_front(points, x_metric, y_metric, *, metric_defs=None) ¤

Change Point Detection¤

calibrax.analysis.changepoint ¤

ChangePoint(*, index, timestamp=None, run_id=None, magnitude=0.0) dataclass ¤

to_dict() ¤

from_dict(data) classmethod ¤

detect_change_points(trend, *, method='pelt', min_size=3, penalty=None) ¤

`calibrax.analysis.regression` ¤

`detect_regressions(run, baseline, threshold=0.05)` ¤

`calibrax.analysis.ranking` ¤

`rank_table(run, metric, group_by_tag='framework')` ¤

`aggregate_score(run, weights)` ¤

`calibrax.analysis.comparison` ¤

`MetricComparison(*, metric_name, values, rankings, best_label, improvement_factors)` `dataclass` ¤

`to_dict()` ¤

`ComparisonReport(*, name, labels_compared, metric_comparisons, winner_by_metric, overall_winner)` `dataclass` ¤

`to_dict()` ¤

`from_dict(data)` `classmethod` ¤

`compare_configurations(runs, metrics=None, *, group_by_tag='framework')` ¤

`calibrax.analysis.scaling` ¤

`scaling_fit(sizes, values)` ¤

`calibrax.analysis.pareto` ¤

`pareto_front(points, x_metric, y_metric, *, metric_defs=None)` ¤

`calibrax.analysis.changepoint` ¤

`ChangePoint(*, index, timestamp=None, run_id=None, magnitude=0.0)` `dataclass` ¤

`to_dict()` ¤

`from_dict(data)` `classmethod` ¤

`detect_change_points(trend, *, method='pelt', min_size=3, penalty=None)` ¤