Regression Detection¤
Calibrax detects performance regressions by comparing a new run against a stored baseline, respecting each metric's direction to determine what counts as a degradation.
How It Works¤
detect_regressions() compares matching metrics between two runs. For each
metric present in both the current run and the baseline:
- It looks up the
MetricDef.directionfrom the run'smetric_defs - For
HIGHERmetrics, a regression occurs when the value drops belowbaseline * (1 - threshold) - For
LOWERmetrics, a regression occurs when the value rises abovebaseline * (1 + threshold) INFOmetrics are always skipped
from calibrax.analysis.regression import detect_regressions
from calibrax.core.models import (
MetricDef, MetricDirection, Metric, Point, Run,
)
# Two runs with the same metric definitions
metric_defs = {
"throughput": MetricDef(
name="throughput", unit="samples/sec", direction=MetricDirection.HIGHER
),
"latency": MetricDef(
name="latency", unit="ms", direction=MetricDirection.LOWER
),
}
baseline = Run(
points=(Point(
name="forward_pass", scenario="training",
metrics={"throughput": Metric(value=1000.0), "latency": Metric(value=1.0)},
),),
metric_defs=metric_defs,
)
current = Run(
points=(Point(
name="forward_pass", scenario="training",
metrics={"throughput": Metric(value=920.0), "latency": Metric(value=1.12)},
),),
metric_defs=metric_defs,
)
regressions = detect_regressions(current, baseline, threshold=0.05)
for r in regressions:
print(f"{r.metric} ({r.direction.value}): "
f"{r.baseline_value} -> {r.current_value} ({r.delta_pct:+.1f}%)")
Using Baselines from a Store¤
In practice, baselines are managed through the Store:
from calibrax.storage.store import Store
from calibrax.analysis.regression import detect_regressions
from calibrax.core.models import MetricDef, MetricDirection, Metric, Point, Run
store = Store("/tmp/calibrax-regression-demo")
baseline_run = Run(
points=(Point(name="fwd", scenario="train",
metrics={"throughput": Metric(value=1000.0)}),),
metric_defs={"throughput": MetricDef(
name="throughput", unit="samples/sec", direction=MetricDirection.HIGHER)},
)
new_run = Run(
points=(Point(name="fwd", scenario="train",
metrics={"throughput": Metric(value=900.0)}),),
metric_defs={"throughput": MetricDef(
name="throughput", unit="samples/sec", direction=MetricDirection.HIGHER)},
)
# Set a baseline
store.save(baseline_run)
store.set_baseline(baseline_run.id)
# Later, detect regressions against the stored baseline
baseline = store.get_baseline()
if baseline is not None:
regressions = detect_regressions(new_run, baseline, threshold=0.05)
Interpreting Regression Objects¤
Each Regression object contains:
| Field | Description |
|---|---|
metric |
Metric name (e.g., "throughput") |
point_name |
Name of the measurement point |
baseline_value |
Value from the baseline run |
current_value |
Value from the current run |
delta_pct |
Percentage change (negative for drops, positive for increases) |
direction |
MetricDirection — how this metric is compared |
Scaling Law Fitting¤
scaling_fit() fits a power law y = a * x^b to a series of measurements,
useful for predicting how performance scales with input size:
from calibrax.analysis.scaling import scaling_fit
sizes = [100, 500, 1000, 5000, 10000]
times = [0.01, 0.05, 0.10, 0.52, 1.05]
law = scaling_fit(sizes, times)
print(f"Coefficient: {law.coefficient:.4f}")
print(f"Exponent: {law.exponent:.2f}")
print(f"R-squared: {law.r_squared:.4f}")
print(f"Complexity: {law.complexity}") # e.g., "O(n)"
The complexity field maps common exponents to Big-O notation:
O(1), O(sqrt(n)), O(n), O(n^1.5), O(n^2), O(n^3).
Next Steps¤
-
CI Integration
Gate CI pipelines on regression detection results
-
Comparing Configurations
Rank multiple configurations and find Pareto-optimal tradeoffs