Skip to content

Overview¤

Calibrax is an extensible benchmarking framework designed for the JAX scientific ML ecosystem. It provides a complete toolkit for profiling workloads, analyzing results with statistical rigor, detecting regressions, and exporting publication-ready reports.

Design Principles¤

Composition over inheritanceBenchmarkResult uses composed objects (timing, resources, metrics) rather than flat monolithic fields. Each concern is modeled by a dedicated dataclass that can be used independently.

Protocol-driven — Universal protocols (BenchmarkProtocol, DatasetProtocol, MetricProtocol) use structural subtyping so any class with the right methods satisfies the contract — no base class required.

Direction-aware metrics — Every metric declares whether higher or lower is better via MetricDirection. All analysis functions use this direction to determine comparison semantics, eliminating a common source of regression detection bugs.

JAX-native — Statistical computations use JAX where possible. Profiling tools support GPU synchronization via sync_fn callbacks. The NNXBenchmarkAdapter inherits from nnx.Module for JIT/vmap/grad compatibility.

Zero lock-in — Clean abstractions and protocols allow domain-specific extensions without importing Calibrax internals.

Module Map¤

flowchart TD
    Core[core] --> Profiling[profiling]
    Core --> Statistics[statistics]
    Core --> Analysis[analysis]
    Core --> Validation[validation]
    Core --> Storage[storage]
    Core --> Metrics[metrics]
    Analysis --> CI[ci]
    Storage --> CI
    Storage --> Exporters[exporters]
    Core --> Monitoring[monitoring]
    Profiling --> Monitoring
    Storage --> CLI[cli]
    CI --> CLI

    style Core fill:#e3f2fd
    style Profiling fill:#fff3e0
    style Statistics fill:#fff3e0
    style Analysis fill:#fff3e0
    style Validation fill:#fff3e0
    style Storage fill:#fff3e0
    style Metrics fill:#fff3e0
    style Monitoring fill:#fff3e0
    style Exporters fill:#c8e6c9
    style CI fill:#c8e6c9
    style CLI fill:#c8e6c9

Modules¤

Module Purpose
calibrax.core Data models, protocols, adapters, result container, registry
calibrax.profiling Timing, resource monitoring, GPU memory, energy, FLOPS
calibrax.statistics Bootstrap CI, hypothesis tests, effect sizes, outlier detection
calibrax.analysis Regression detection, comparison, ranking, scaling, Pareto
calibrax.validation Convergence, accuracy assessment, validation framework
calibrax.monitoring Alert management, production monitoring
calibrax.storage JSON-per-run store, baseline repository
calibrax.exporters W&B, MLflow, publication-ready LaTeX/HTML/CSV/matplotlib output
calibrax.metrics 4-tier metric system with 111 registered Tier 0 metrics across 17 domains plus Tier 1-3 APIs and losses
calibrax.ci CI regression gate, git bisect automation
calibrax.cli Command-line interface

Reading Paths¤

Benchmarking a JAX model for the first time: Core ConceptsProfilingStorageRegressions

Setting up CI regression checks: StorageRegressionsCI Integration

Comparing framework configurations: ProfilingComparisonExporters

Publishing benchmark results: StatisticsComparisonExporters