Peer Comparison¤
Calibrax overlaps with metric libraries and benchmark tools, but its center of gravity is different: JAX-native scientific ML benchmarking, profiling, statistical analysis, regression detection, and metric evaluation in one package.
Summary¤
| Tool | Primary focus | Where Calibrax differs |
|---|---|---|
| TorchMetrics | PyTorch metric implementations with functional and module APIs, broad domains, wrappers, plotting, and distributed support | Calibrax targets JAX first, adds benchmark storage, profiling, statistical comparison, CI regression gates, and geometry-heavy metric metadata |
| jax_metrics | JAX metric and loss abstractions with pytree state, distributed-friendly accumulation, and numerical-equivalence discipline | Calibrax has a broader benchmarking system, 111 registered Tier 0 metrics, registry metadata, exporters, storage, and regression analysis |
| ASV | Benchmarking Python packages over time with runtime, memory, custom values, and static web output | Calibrax stores JSON-per-run benchmark results inside the project workflow and adds JAX-specific profiling, statistical tests, and CI gates |
| CodSpeed | Hosted and CI-oriented performance testing with PR checks, profiling, and benchmark reports | Calibrax now includes a focused CodSpeed workflow for PR benchmark checks while keeping local storage and analysis in Calibrax |
TorchMetrics¤
Use TorchMetrics when the project is PyTorch-based and needs its mature metric catalog, Lightning integration, module-state API, wrappers, and plotting.
Use Calibrax when the code is JAX-based, when metric metadata needs to be queried by domain, mathematical properties, or invariance, or when metric values need to live alongside profiling and benchmark regression records.
TorchMetrics includes mature plotting and distributed wrappers beyond Calibrax's current stateful metric surface. Calibrax now covers CRPS for JAX ensemble forecasts and exposes VMAF through an optional FFmpeg/libvmaf boundary rather than a core dependency.
jax_metrics¤
jax_metrics is closest to Calibrax on framework choice. It provides
Keras-like metric and loss abstractions, pytree-friendly state, distributed
accumulation, and notes that metrics are usually checked against Keras or
TorchMetrics references.
Calibrax should keep that numerical-equivalence bar for standard metrics, but its current scope is larger: metric registry discovery, geometry and graph metrics, benchmarking, profiling, storage, exporters, and CI regression gates.
ASV¤
ASV is strongest for long-running benchmark history across commits and environments. It can publish an interactive static site and is widely used by scientific Python projects.
Calibrax is lighter-weight for local scientific ML workflows that need JAX timing, hardware metadata, statistical comparison, baseline storage, and regression checks without adopting a separate ASV project layout.
CodSpeed¤
CodSpeed is strongest when a hosted PR workflow should catch performance regressions before merge. Its docs cover CI setup, benchmark checks, profiling, and repository integration.
Calibrax includes a focused CodSpeed workflow for the tests/performance/
benchmark suite. Treat CodSpeed as an external reporting layer for PR checks;
Calibrax storage, statistical analysis, and regression gates remain the local
source of benchmark truth.