calibrax.validation¤
Validation tools for verifying benchmark correctness: convergence analysis (rate estimation and tolerance checking), accuracy assessment against targets, and structured validation reporting.
Framework¤
calibrax.validation.framework
¤
Generic validation report for benchmark validation results.
ValidationReport(*, name, reference, accuracy_metrics, convergence_metrics=dict(), violations=(), passed=True, notes='')
dataclass
¤
Report of validation results against reference methods.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Benchmark or experiment name. |
reference |
str
|
Name of reference method or dataset. |
accuracy_metrics |
dict[str, float]
|
Metric name to achieved value. |
convergence_metrics |
dict[str, float]
|
Convergence metric name to rate. |
violations |
tuple[str, ...]
|
Tuple of violation descriptions (empty if none). |
passed |
bool
|
Whether validation passed overall. |
notes |
str
|
Free-form notes or warnings. |
to_dict()
¤
Serialize to a JSON-compatible dictionary.
from_dict(data)
classmethod
¤
Deserialize from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with validation report fields. |
required |
Returns:
| Type | Description |
|---|---|
ValidationReport
|
Reconstructed ValidationReport instance. |
Convergence¤
calibrax.validation.convergence
¤
Generic convergence analysis for benchmark validation.
Provides convergence rate computation and tolerance achievement tracking using pure Python math (no numpy/jax dependency).
ConvergenceResult(*, rates, achieved, iterations=dict(), optimal_tolerance=None)
dataclass
¤
Analysis of convergence behavior.
Attributes:
| Name | Type | Description |
|---|---|---|
rates |
dict[str, float]
|
Metric name to convergence rate (log-reduction per step). |
achieved |
dict[str, bool]
|
Composite key (metric_tolerance) to whether convergence achieved. |
iterations |
dict[str, int]
|
Composite key (metric_tolerance) to iteration count. |
optimal_tolerance |
float | None
|
Best tolerance that was still achieved, or None. |
to_dict()
¤
Serialize to a JSON-compatible dictionary.
from_dict(data)
classmethod
¤
Deserialize from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with convergence result fields. |
required |
Returns:
| Type | Description |
|---|---|
ConvergenceResult
|
Reconstructed ConvergenceResult instance. |
check_convergence(metric_series, tolerances)
¤
Check convergence across metrics at given tolerances.
For each metric, computes the average log-reduction rate per step and checks whether the final value meets each tolerance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_series
|
Mapping[str, Sequence[float]]
|
{metric_name: [values_at_increasing_resolution]}. Values should decrease toward zero for convergent metrics. |
required |
tolerances
|
Sequence[float]
|
Tolerance thresholds to check against. |
required |
Returns:
| Type | Description |
|---|---|
ConvergenceResult
|
ConvergenceResult with rates, achievement flags, and iteration counts. |
Accuracy¤
calibrax.validation.accuracy
¤
Generic accuracy assessment for benchmark validation.
Compares an achieved value against a target, computing pass/fail and margin.
AccuracyResult(*, target, achieved, metric_type, units, passed, margin)
dataclass
¤
Assessment of accuracy against a target.
Attributes:
| Name | Type | Description |
|---|---|---|
target |
float
|
Target accuracy threshold. |
achieved |
float
|
Achieved accuracy value. |
metric_type |
str
|
Type of accuracy (e.g. "accuracy", "mse"). |
units |
str
|
Units of measurement (e.g. "relative", "eV"). |
passed |
bool
|
Whether achieved meets the target (achieved <= target). |
margin |
float
|
Difference between target and achieved (positive = headroom). |
to_dict()
¤
Serialize to a JSON-compatible dictionary.
from_dict(data)
classmethod
¤
Deserialize from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with accuracy result fields. |
required |
Returns:
| Type | Description |
|---|---|
AccuracyResult
|
Reconstructed AccuracyResult instance. |
check_accuracy(achieved, target, *, metric_type='accuracy', units='relative')
¤
Check whether an achieved value meets a target.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
achieved
|
float
|
The measured value. |
required |
target
|
float
|
The target threshold (achieved must be <= target to pass). |
required |
metric_type
|
str
|
Label for the type of accuracy check. |
'accuracy'
|
units
|
str
|
Units of measurement. |
'relative'
|
Returns:
| Type | Description |
|---|---|
AccuracyResult
|
AccuracyResult with pass/fail and margin. |