calibrax.statistics¤

Statistical analysis tools for benchmark measurements. Provides descriptive statistics with bootstrap confidence intervals, MAD-based outlier detection, significance tests (Welch's t, Mann-Whitney U, Wilcoxon), and Cohen's d effect size.

Analyzer¤

`calibrax.statistics.analyzer` ¤

Statistical analysis for benchmark measurements.

Provides summary statistics with bootstrap confidence intervals, outlier detection via modified Z-scores, and stability assessment.

`StatisticalResult(*, mean, median, std, min, max, cv, ci_lower, ci_upper, n, is_stable)` `dataclass` ¤

Summary statistics with confidence intervals.

Attributes:

Name	Type	Description
`mean`	`float`	Arithmetic mean.
`median`	`float`	Median value.
`std`	`float`	Sample standard deviation (ddof=1).
`min`	`float`	Minimum value.
`max`	`float`	Maximum value.
`cv`	`float`	Coefficient of variation (std / mean).
`ci_lower`	`float`	95% bootstrap CI lower bound.
`ci_upper`	`float`	95% bootstrap CI upper bound.
`n`	`int`	Number of samples.
`is_stable`	`bool`	True when CV < STABILITY_CV_THRESHOLD.

`to_dict()` ¤

Serialize to a JSON-compatible dictionary.

`from_dict(data)` `classmethod` ¤

Deserialize from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with statistical result fields.	required

Returns:

Type	Description
`StatisticalResult`	Reconstructed StatisticalResult instance.

`StatisticalAnalyzer(bootstrap_resamples=1000, seed=42)` ¤

Statistical analysis for benchmark measurements.

Provides summary statistics with bootstrap confidence intervals, modified Z-score outlier detection, and stability assessment.

Parameters:

Name	Type	Description	Default
`bootstrap_resamples`	`int`	Number of bootstrap resamples for CI computation.	`1000`
`seed`	`int`	Random seed for reproducible bootstrap sampling.	`42`

Initialize with bootstrap parameters.

Parameters:

Name	Type	Description	Default
`bootstrap_resamples`	`int`	Number of bootstrap resamples for CI computation.	`1000`
`seed`	`int`	Random seed for reproducible bootstrap sampling.	`42`

`summarize(samples)` ¤

Compute summary statistics with bootstrap CI.

Parameters:

Name	Type	Description	Default
`samples`	`Sequence[float]`	Sequence of measurement values (at least 1).	required

Returns:

Type	Description
`StatisticalResult`	StatisticalResult with all computed statistics.

`bootstrap_ci(samples, confidence=0.95)` ¤

Percentile bootstrap confidence interval.

Parameters:

Name	Type	Description	Default
`samples`	`Sequence[float]`	Sequence of measurement values.	required
`confidence`	`float`	Confidence level (default 0.95 for 95% CI).	`0.95`

Returns:

Type	Description
`tuple[float, float]`	Tuple of (lower_bound, upper_bound).

`detect_outliers(samples, threshold=OUTLIER_Z_THRESHOLD)` ¤

Modified Z-score outlier detection.

Uses median absolute deviation (MAD) instead of standard deviation for robustness against the outliers themselves.

Parameters:

Name	Type	Description	Default
`samples`	`Sequence[float]`	Sequence of values to check.	required
`threshold`	`float`	Modified Z-score threshold (default 3.5).	`OUTLIER_Z_THRESHOLD`

Returns:

Type	Description
`list[int]`	List of indices where outliers are detected.

Significance Testing¤

Optional Dependency

Significance tests require scipy: uv pip install "calibrax[stats]"

`calibrax.statistics.significance` ¤

Statistical significance tests for benchmark comparisons.

Provides Welch's t-test, Mann-Whitney U, paired Wilcoxon signed-rank test (with pure-Python sign test fallback), and Cohen's d effect size.

`welch_t_test(a, b)` ¤

Welch's t-test for unequal variances.

Requires scipy. Raises ImportError with clear message if unavailable.

Parameters:

Name	Type	Description	Default
`a`	`Sequence[float]`	First sample measurements.	required
`b`	`Sequence[float]`	Second sample measurements.	required

Returns:

Type	Description
`tuple[float, float]`	Tuple of (t_statistic, p_value).

Raises:

Type	Description
`ImportError`	If scipy is not installed.

`mann_whitney_u(a, b)` ¤

Mann-Whitney U test for non-parametric distribution comparison.

Requires scipy. Raises ImportError with clear message if unavailable.

Parameters:

Name	Type	Description	Default
`a`	`Sequence[float]`	First sample measurements.	required
`b`	`Sequence[float]`	Second sample measurements.	required

Returns:

Type	Description
`tuple[float, float]`	Tuple of (u_statistic, p_value).

Raises:

Type	Description
`ImportError`	If scipy is not installed.

`paired_significance_test(a, b, *, alpha=0.05)` ¤

Wilcoxon signed-rank test for paired samples.

Tests whether two related samples have the same distribution. Uses scipy.stats.wilcoxon when available, falls back to a pure-Python sign test approximation for small samples.

Parameters:

Name	Type	Description	Default
`a`	`list[float]`	First sample (e.g., baseline measurements).	required
`b`	`list[float]`	Second sample (e.g., current measurements). Must be same length as a.	required
`alpha`	`float`	Significance threshold (default 0.05).	`0.05`

Returns:

Type	Description
`SignificanceResult`	SignificanceResult with p_value, statistic, effect_size (Cohen's d),
`SignificanceResult`	significant flag, and method name.

Raises:

Type	Description
`ValueError`	If samples are empty or have different lengths.

`effect_size(a, b)` ¤

Cohen's d effect size for two independent samples.

Parameters:

Name	Type	Description	Default
`a`	`Sequence[float]`	First sample.	required
`b`	`Sequence[float]`	Second sample.	required

Returns:

Type	Description
`float`	Absolute Cohen's d value. Returns 0.0 if pooled std is zero.

calibrax.statistics¤

Analyzer¤

calibrax.statistics.analyzer ¤

StatisticalResult(*, mean, median, std, min, max, cv, ci_lower, ci_upper, n, is_stable) dataclass ¤

to_dict() ¤

from_dict(data) classmethod ¤

StatisticalAnalyzer(bootstrap_resamples=1000, seed=42) ¤

summarize(samples) ¤

bootstrap_ci(samples, confidence=0.95) ¤

detect_outliers(samples, threshold=OUTLIER_Z_THRESHOLD) ¤

Significance Testing¤

calibrax.statistics.significance ¤

welch_t_test(a, b) ¤

mann_whitney_u(a, b) ¤

paired_significance_test(a, b, *, alpha=0.05) ¤

effect_size(a, b) ¤

`calibrax.statistics.analyzer` ¤

`StatisticalResult(*, mean, median, std, min, max, cv, ci_lower, ci_upper, n, is_stable)` `dataclass` ¤

`to_dict()` ¤

`from_dict(data)` `classmethod` ¤

`StatisticalAnalyzer(bootstrap_resamples=1000, seed=42)` ¤

`summarize(samples)` ¤

`bootstrap_ci(samples, confidence=0.95)` ¤

`detect_outliers(samples, threshold=OUTLIER_Z_THRESHOLD)` ¤

`calibrax.statistics.significance` ¤

`welch_t_test(a, b)` ¤

`mann_whitney_u(a, b)` ¤

`paired_significance_test(a, b, *, alpha=0.05)` ¤

`effect_size(a, b)` ¤