Skip to content

calibrax.validation¤

Validation tools for verifying benchmark correctness: convergence analysis (rate estimation and tolerance checking), accuracy assessment against targets, and structured validation reporting.

Framework¤

calibrax.validation.framework ¤

Generic validation report for benchmark validation results.

ValidationReport(*, name, reference, accuracy_metrics, convergence_metrics=dict(), violations=(), passed=True, notes='') dataclass ¤

Report of validation results against reference methods.

Attributes:

Name Type Description
name str

Benchmark or experiment name.

reference str

Name of reference method or dataset.

accuracy_metrics dict[str, float]

Metric name to achieved value.

convergence_metrics dict[str, float]

Convergence metric name to rate.

violations tuple[str, ...]

Tuple of violation descriptions (empty if none).

passed bool

Whether validation passed overall.

notes str

Free-form notes or warnings.

to_dict() ¤

Serialize to a JSON-compatible dictionary.

from_dict(data) classmethod ¤

Deserialize from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with validation report fields.

required

Returns:

Type Description
ValidationReport

Reconstructed ValidationReport instance.

Convergence¤

calibrax.validation.convergence ¤

Generic convergence analysis for benchmark validation.

Provides convergence rate computation and tolerance achievement tracking using pure Python math (no numpy/jax dependency).

ConvergenceResult(*, rates, achieved, iterations=dict(), optimal_tolerance=None) dataclass ¤

Analysis of convergence behavior.

Attributes:

Name Type Description
rates dict[str, float]

Metric name to convergence rate (log-reduction per step).

achieved dict[str, bool]

Composite key (metric_tolerance) to whether convergence achieved.

iterations dict[str, int]

Composite key (metric_tolerance) to iteration count.

optimal_tolerance float | None

Best tolerance that was still achieved, or None.

to_dict() ¤

Serialize to a JSON-compatible dictionary.

from_dict(data) classmethod ¤

Deserialize from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with convergence result fields.

required

Returns:

Type Description
ConvergenceResult

Reconstructed ConvergenceResult instance.

check_convergence(metric_series, tolerances) ¤

Check convergence across metrics at given tolerances.

For each metric, computes the average log-reduction rate per step and checks whether the final value meets each tolerance.

Parameters:

Name Type Description Default
metric_series Mapping[str, Sequence[float]]

{metric_name: [values_at_increasing_resolution]}. Values should decrease toward zero for convergent metrics.

required
tolerances Sequence[float]

Tolerance thresholds to check against.

required

Returns:

Type Description
ConvergenceResult

ConvergenceResult with rates, achievement flags, and iteration counts.

Accuracy¤

calibrax.validation.accuracy ¤

Generic accuracy assessment for benchmark validation.

Compares an achieved value against a target, computing pass/fail and margin.

AccuracyResult(*, target, achieved, metric_type, units, passed, margin) dataclass ¤

Assessment of accuracy against a target.

Attributes:

Name Type Description
target float

Target accuracy threshold.

achieved float

Achieved accuracy value.

metric_type str

Type of accuracy (e.g. "accuracy", "mse").

units str

Units of measurement (e.g. "relative", "eV").

passed bool

Whether achieved meets the target (achieved <= target).

margin float

Difference between target and achieved (positive = headroom).

to_dict() ¤

Serialize to a JSON-compatible dictionary.

from_dict(data) classmethod ¤

Deserialize from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with accuracy result fields.

required

Returns:

Type Description
AccuracyResult

Reconstructed AccuracyResult instance.

check_accuracy(achieved, target, *, metric_type='accuracy', units='relative') ¤

Check whether an achieved value meets a target.

Parameters:

Name Type Description Default
achieved float

The measured value.

required
target float

The target threshold (achieved must be <= target to pass).

required
metric_type str

Label for the type of accuracy check.

'accuracy'
units str

Units of measurement.

'relative'

Returns:

Type Description
AccuracyResult

AccuracyResult with pass/fail and margin.