Adding a Metric¤
Metric changes should keep the registry, tests, and documentation aligned. The current registry contains 111 Tier 0 pure-function metrics across 17 domains; Tier 1-3 APIs and losses use separate patterns.
Choose the Tier¤
| Tier | Use when | Typical location |
|---|---|---|
| Tier 0 pure function | The metric is a stateless JAX function | src/calibrax/metrics/functional/ |
| Tier 1 frozen backbone | The metric accumulates features from fixed pretrained weights | src/calibrax/metrics/stateful/ or plugins/ |
| Tier 2 learned metric | The metric has trainable parameters | src/calibrax/metrics/stateful/ or plugins/ |
| Tier 3 metric learning | The function is a differentiable training loss | src/calibrax/metrics/learning/ |
Implementation Steps¤
- Add the implementation in the existing domain module, or create a focused module if the domain is new.
- Use JAX arrays and
jax.numpyfor numeric operations. - Register Tier 0 functions with
MetricEntrymetadata insrc/calibrax/metrics/_builtin_registrations.pyor the local registration helper already used by the domain. - Set the domain, direction, signature, required extra, and mathematical properties deliberately.
- Export the function from the relevant package
__init__.pyif neighboring metrics are exported there. - Add unit tests for shape handling, edge cases, JIT compatibility when expected, and numerical behavior.
- Add numerical-equivalence tests against scikit-learn, SciPy, TorchMetrics, or a paper reference when a reliable implementation exists.
- Update user-guide and API docs when the metric changes public behavior.
Numerical Equivalence¤
Prefer reference-backed tests for standard metrics. The baseline pattern is:
import numpy as np
ABS_TOL = 1e-6
def assert_close(actual, expected):
np.testing.assert_allclose(float(actual), float(expected), atol=ABS_TOL)
Use deterministic arrays and document any semantic differences from the reference library, such as label averaging, zero-division handling, smoothing, or log base.
Required Checks¤
source activate.sh
uv run pytest tests/metrics/ -v
uv run pytest
uv run pre-commit run --all-files
uv run mkdocs build --strict --clean
If the targeted test run uses repository-wide coverage options, add --no-cov
for quick local iteration and rely on the full uv run pytest command for the
coverage gate.