Exporting Results¤
Calibrax supports exporting benchmark results to Weights & Biases (W&B),
MLflow, and publication-ready plots and tables. All exporters implement the
Exporter ABC, and custom exporters can be built by subclassing it.
Exporter Interface¤
All exporters implement two methods:
from abc import ABC
from calibrax.exporters.base import Exporter
from calibrax.core.models import Run
class Exporter(ABC):
def export_run(self, run: Run) -> str:
"""Export a single run. Returns a URL or identifier."""
...
def export_analysis(self, run: Run, baseline: Run | None = None) -> None:
"""Export analysis comparing run against an optional baseline."""
...
Weights & Biases¤
Import Path
WandBExporter is not re-exported from calibrax.exporters to avoid
loading wandb at import time. Import it directly:
Optional Dependency
Requires wandb: uv pip install "calibrax[wandb]"
Basic Usage¤
from calibrax.exporters.wandb import WandBExporter
from calibrax.core.models import Metric, MetricDef, MetricDirection, Point, Run
# Create sample runs for export
run = Run(
points=(Point(name="forward_pass", scenario="training",
tags={"framework": "flax"},
metrics={"throughput": Metric(value=1200.0),
"latency": Metric(value=0.8)}),),
metric_defs={
"throughput": MetricDef(name="throughput", unit="samples/sec",
direction=MetricDirection.HIGHER),
"latency": MetricDef(name="latency", unit="ms",
direction=MetricDirection.LOWER),
},
)
baseline_run = Run(
points=(Point(name="forward_pass", scenario="training",
tags={"framework": "flax"},
metrics={"throughput": Metric(value=1100.0),
"latency": Metric(value=0.9)}),),
metric_defs=run.metric_defs,
)
exporter = WandBExporter(
project="my-benchmarks",
entity="my-team", # optional
tags=["nightly", "gpu"], # optional
)
# Check authentication
if not exporter.check_auth():
print("Run 'wandb login' first")
# Export a run
url = exporter.export_run(run)
print(f"View at: {url}")
# Export analysis with baseline comparison
exporter.export_analysis(run, baseline=baseline_run)
Exporting Trends¤
from pathlib import Path
from calibrax.storage.store import Store
store = Store(Path("temp/doc-examples/exporters-store"))
store.save(run)
exporter.export_trends(
store=store,
metric="throughput",
point_name="forward_pass",
tags={"framework": "flax"},
n_runs=50,
)
Logging Custom Artifacts¤
# Log matplotlib figures
exporter.log_figures({"scaling": fig})
# Log HTML artifacts
exporter.log_html_artifacts({"report": html_string})
# Log custom tables
exporter.log_extra_tables({
"results": (
["Framework", "Throughput", "Latency"], # headers
[["flax", 1200, 0.8], ["pytorch", 950, 1.2]], # rows
),
})
MLflow¤
Import Path
Like WandBExporter, MLflowExporter is not re-exported from
calibrax.exporters to avoid loading mlflow at import time. Import it
directly:
Optional Dependency
Requires mlflow: uv pip install "calibrax[mlflow]"
Basic Usage¤
from calibrax.exporters.mlflow import MLflowExporter
from calibrax.core.models import Metric, MetricDef, MetricDirection, Point, Run
run = Run(
points=(Point(name="forward_pass", scenario="training",
tags={"framework": "flax"},
metrics={"throughput": Metric(value=1200.0),
"latency": Metric(value=0.8)}),),
metric_defs={
"throughput": MetricDef(name="throughput", unit="samples/sec",
direction=MetricDirection.HIGHER),
"latency": MetricDef(name="latency", unit="ms",
direction=MetricDirection.LOWER),
},
)
exporter = MLflowExporter(
experiment_name="my-benchmarks",
tracking_uri="http://localhost:5000", # optional
)
# Export a run — returns the MLflow run ID
run_id = exporter.export_run(run)
# Export analysis with baseline comparison (logs regressions as metrics)
exporter.export_analysis(run, baseline=baseline_run)
Each benchmark point's metrics are logged as MLflow metrics, environment metadata is logged as MLflow parameters, and regression analysis produces a JSON artifact.
Publication Generator¤
Optional Dependency
Plot generation requires matplotlib: uv pip install "calibrax[publication]"
Table generation (LaTeX, HTML, CSV) works without matplotlib.
PublicationGenerator creates plots and tables suitable for papers and reports:
from pathlib import Path
from calibrax.exporters.publication import PublicationGenerator
pub = PublicationGenerator(output_dir=Path("temp/doc-examples/figures"))
Comparison Plots¤
Bar charts comparing metrics across configurations:
path = pub.generate_comparison_plot(
run,
metrics=["throughput", "latency"],
output_format="pdf", # "png", "pdf", or "svg"
)
if path is not None:
print(f"Plot saved to {path}")
Scaling Plots¤
Log-log plots with fitted scaling laws:
path = pub.generate_scaling_plot(
sizes=[100, 500, 1000, 5000],
values=[0.01, 0.05, 0.10, 0.52],
metric_name="latency",
output_format="pdf",
)
Convergence Plots¤
Time series plots showing metric trends:
trend = store.extract_trend("throughput", "forward_pass", {"framework": "flax"})
path = pub.generate_convergence_plot(trend, output_format="png")
Tables¤
Generate LaTeX, HTML, or CSV tables:
# LaTeX table
path = pub.generate_table(run, output_format="latex")
print(f"LaTeX table at {path}") # ./figures/table.tex
# HTML table
path = pub.generate_table(run, output_format="html")
# CSV table
path = pub.generate_table(
run,
metrics=["throughput", "latency"],
output_format="csv",
group_by_tag="framework",
)
Writing Custom Exporters¤
Subclass Exporter to integrate with other backends:
from calibrax.exporters.base import Exporter
from calibrax.core.models import Run
class SlackExporter(Exporter):
def __init__(self, webhook_url: str) -> None:
self._webhook_url = webhook_url
def export_run(self, run: Run) -> str:
# Post summary to Slack
message = f"Benchmark {run.id}: {len(run.points)} points"
# ... send to webhook ...
return self._webhook_url
def export_analysis(self, run: Run, baseline: Run | None = None) -> None:
# Post regression summary
...
Next Steps¤
-
CI Integration
Automate exports as part of CI pipelines
-
Storage & Baselines
Manage the data that feeds into exports