calibrax.monitoring¤

Runtime monitoring and alerting. AlertManager handles alert collection and dispatch, AdvancedMonitor adds threshold-based background monitoring, and ProductionMonitor extends it with pipeline execution tracking and health reports.

Monitor¤

`calibrax.monitoring.monitor` ¤

Alert management and background metric monitoring.

Provides threshold-based alerting with configurable handlers and background monitoring of system resources via daemon thread.

`AlertSeverity` ¤

Bases: StrEnum

Severity levels for monitoring alerts.

`Alert(*, message, severity, metric_name, metric_value, threshold, timestamp=time.time(), metadata=dict())` `dataclass` ¤

A single monitoring alert triggered by a threshold violation.

Attributes:

Name	Type	Description
`message`	`str`	Human-readable description of the alert.
`severity`	`AlertSeverity`	Alert severity level.
`metric_name`	`str`	Name of the metric that triggered the alert.
`metric_value`	`float`	Observed value that triggered the alert.
`threshold`	`float`	Threshold that was exceeded.
`timestamp`	`float`	When the alert was triggered.
`metadata`	`dict[str, Any]`	Additional context about the alert.

`to_dict()` ¤

Serialize to a JSON-compatible dictionary.

`AlertManager(max_alerts=1000)` ¤

Thread-safe alert storage with callback handlers.

Parameters:

Name	Type	Description	Default
`max_alerts`	`int`	Maximum number of alerts to retain (oldest dropped first).	`1000`

Initialize the alert manager.

`add_alert_handler(handler)` ¤

Register a callback invoked on each new alert.

Parameters:

Name	Type	Description	Default
`handler`	`Callable[[Alert], None]`	Callable that receives an Alert instance.	required

`trigger_alert(message, severity, metric_name, metric_value, threshold, metadata=None)` ¤

Create and store an alert, notifying all registered handlers.

Parameters:

Name	Type	Description	Default
`message`	`str`	Human-readable alert description.	required
`severity`	`AlertSeverity`	Severity level.	required
`metric_name`	`str`	Metric that triggered the alert.	required
`metric_value`	`float`	Observed metric value.	required
`threshold`	`float`	Threshold that was exceeded.	required
`metadata`	`dict[str, Any] \| None`	Optional additional context.	`None`

`get_recent_alerts(count=10)` ¤

Return the most recent alerts.

Parameters:

Name	Type	Description	Default
`count`	`int`	Maximum number of alerts to return.	`10`

Returns:

Type	Description
`list[Alert]`	List of recent alerts, newest first.

`get_alerts_by_severity(severity)` ¤

Return all alerts matching the given severity.

Parameters:

Name	Type	Description	Default
`severity`	`AlertSeverity`	Severity level to filter by.	required

Returns:

Type	Description
`list[Alert]`	List of matching alerts.

`clear_alerts()` ¤

Remove all stored alerts.

`AdvancedMonitor(alert_manager=None, gpu_profiler=None, resource_monitor=None)` ¤

Background resource monitor with threshold-based alerting.

Collects CPU, memory, and optional GPU metrics on a daemon thread. Triggers alerts when thresholds are exceeded.

Parameters:

Name	Type	Description	Default
`alert_manager`	`AlertManager \| None`	Alert manager for dispatching alerts. Created if not provided.	`None`
`gpu_profiler`	`GPUProfilerProtocol \| None`	Optional GPU profiler for GPU metrics.	`None`
`resource_monitor`	`ResourceMonitor \| None`	Optional ResourceMonitor for background sampling.	`None`

Initialize the monitor.

`alert_manager` `property` ¤

Access the underlying alert manager.

`set_threshold(metric_name, threshold)` ¤

Set an alerting threshold for a metric.

Parameters:

Name	Type	Description	Default
`metric_name`	`str`	Name of the metric to monitor.	required
`threshold`	`float`	Value above which an alert is triggered.	required

`start_monitoring(interval=5.0)` ¤

Start background monitoring on a daemon thread.

Parameters:

Name	Type	Description	Default
`interval`	`float`	Seconds between metric collection cycles.	`5.0`

`stop_monitoring()` ¤

Stop background monitoring and wait for the thread to finish.

`get_monitoring_summary()` ¤

Return a summary of current monitoring state.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with thresholds, alert counts, and metric history summaries.

Production Monitor¤