Overview¤
Calibrax is an extensible benchmarking framework designed for the JAX scientific ML ecosystem. It provides a complete toolkit for profiling workloads, analyzing results with statistical rigor, detecting regressions, and exporting publication-ready reports.
Design Principles¤
Composition over inheritance — BenchmarkResult uses composed objects
(timing, resources, metrics) rather than flat monolithic fields. Each concern
is modeled by a dedicated dataclass that can be used independently.
Protocol-driven — Universal protocols (BenchmarkProtocol, DatasetProtocol,
MetricProtocol) use structural subtyping so any class with the right methods
satisfies the contract — no base class required.
Direction-aware metrics — Every metric declares whether higher or lower is
better via MetricDirection. All analysis functions use this direction to
determine comparison semantics, eliminating a common source of regression
detection bugs.
JAX-native — Statistical computations use JAX where possible. Profiling tools
support GPU synchronization via sync_fn callbacks. The NNXBenchmarkAdapter
inherits from nnx.Module for JIT/vmap/grad compatibility.
Zero lock-in — Clean abstractions and protocols allow domain-specific extensions without importing Calibrax internals.
Module Map¤
flowchart TD
Core[core] --> Profiling[profiling]
Core --> Statistics[statistics]
Core --> Analysis[analysis]
Core --> Validation[validation]
Core --> Storage[storage]
Core --> Metrics[metrics]
Analysis --> CI[ci]
Storage --> CI
Storage --> Exporters[exporters]
Core --> Monitoring[monitoring]
Profiling --> Monitoring
Storage --> CLI[cli]
CI --> CLI
style Core fill:#e3f2fd
style Profiling fill:#fff3e0
style Statistics fill:#fff3e0
style Analysis fill:#fff3e0
style Validation fill:#fff3e0
style Storage fill:#fff3e0
style Metrics fill:#fff3e0
style Monitoring fill:#fff3e0
style Exporters fill:#c8e6c9
style CI fill:#c8e6c9
style CLI fill:#c8e6c9
Modules¤
| Module | Purpose |
|---|---|
calibrax.core |
Data models, protocols, adapters, result container, registry |
calibrax.profiling |
Timing, resource monitoring, GPU memory, energy, FLOPS |
calibrax.statistics |
Bootstrap CI, hypothesis tests, effect sizes, outlier detection |
calibrax.analysis |
Regression detection, comparison, ranking, scaling, Pareto |
calibrax.validation |
Convergence, accuracy assessment, validation framework |
calibrax.monitoring |
Alert management, production monitoring |
calibrax.storage |
JSON-per-run store, baseline repository |
calibrax.exporters |
W&B, MLflow, publication-ready LaTeX/HTML/CSV/matplotlib output |
calibrax.metrics |
4-tier metric system with 111 registered Tier 0 metrics across 17 domains plus Tier 1-3 APIs and losses |
calibrax.ci |
CI regression gate, git bisect automation |
calibrax.cli |
Command-line interface |
Reading Paths¤
Benchmarking a JAX model for the first time: Core Concepts → Profiling → Storage → Regressions
Setting up CI regression checks: Storage → Regressions → CI Integration
Comparing framework configurations: Profiling → Comparison → Exporters
Publishing benchmark results: Statistics → Comparison → Exporters