calibrax.profiling.gpu¤

GPU memory profiling and optimization. Includes GPUMemoryProfiler for memory snapshots, MemoryOptimizer for pipeline memory analysis, and AdaptiveOperation for hardware-aware shape optimization.

GPU memory profiling and hardware-adaptive operations.

Provides hardware detection, shape optimization, GPU memory profiling (satisfying GPUProfilerProtocol), and memory usage analysis. Includes NVML-based GPU clock and power monitoring when pynvml is available.

`HardwareConfig(*, platform, precision, tile_size, critical_batch_size, memory_layout, use_vmem_optimization)` `dataclass` ¤

Hardware-specific optimization configuration.

Attributes:

Name	Type	Description
`platform`	`str`	Detected platform ("cpu", "tpu", "gpu_modern", "gpu_legacy").
`precision`	`str`	Recommended floating-point precision string.
`tile_size`	`int`	Tile size for matrix operation alignment.
`critical_batch_size`	`int`	Optimal batch size for the platform.
`memory_layout`	`str`	Memory layout preference.
`use_vmem_optimization`	`bool`	Whether VMEM optimization is available.

`MemoryAnalysis(*, baseline_memory_mb, peak_memory_mb, peak_usage_mb, retained_memory_mb, memory_efficiency, suggestions=())` `dataclass` ¤

Result of pipeline memory analysis.

Attributes:

Name	Type	Description
`baseline_memory_mb`	`float`	Memory usage before pipeline execution.
`peak_memory_mb`	`float`	Memory usage at peak during execution.
`peak_usage_mb`	`float`	Peak usage above baseline.
`retained_memory_mb`	`float`	Memory retained after GC.
`memory_efficiency`	`float`	Ratio of freed memory to peak usage.
`suggestions`	`tuple[str, ...]`	Optimization suggestions.

`AdaptiveOperation()` ¤

Hardware-adaptive operations with auto-detection.

Detects the current JAX backend (CPU/GPU/TPU) and provides optimized configuration and shape padding.

Initialize with auto-detected hardware configuration.

`optimize_shapes(*shapes)` ¤

Pad tensor shapes to align with hardware tile size.

Parameters:

Name	Type	Description	Default
`*shapes`	`tuple[int, ...]`	Variable number of tensor shapes to optimize.	`()`

Returns:

Type	Description
`list[tuple[int, ...]]`	List of optimized shapes padded to tile_size multiples.

`GPUMemoryProfiler()` ¤

GPU memory profiling satisfying GPUProfilerProtocol.

Uses multi-fallback strategy: memory_stats -> xla_bridge -> zeros.

Initialize GPU memory profiler with GPU detection.

`get_memory_usage()` ¤

Get current GPU memory usage statistics.

Returns:

Type	Description
`dict[str, float]`	Dictionary with gpu_memory_used_mb, gpu_memory_total_mb,
`dict[str, float]`	and optionally gpu_memory_utilization.

`get_utilization()` ¤

Get GPU utilization percentage for ResourceMonitor.

Returns:

Type	Description
`float`	GPU memory utilization as percentage (0-100), or 0.0.

`get_clock_info()` ¤

Get current GPU clock frequencies via NVML.

Returns:

Type	Description
`dict[str, float]`	Dictionary with 'gpu_clock_mhz' and 'mem_clock_mhz' keys.
`dict[str, float]`	Returns zeros if NVML is unavailable or query fails.

`get_power_info()` ¤

Get current GPU power draw and limit via NVML.

Returns:

Type	Description
`dict[str, float]`	Dictionary with 'power_draw_w' and 'power_limit_w' keys.
`dict[str, float]`	Returns zeros if NVML is unavailable or query fails.

`analyze_memory_pattern(measurements)` ¤

Analyze memory usage patterns and suggest optimizations.

Parameters:

Name	Type	Description	Default
`measurements`	`list[dict[str, float]]`	List of memory usage dictionaries.	required

Returns:

Type	Description
`list[str]`	List of optimization suggestion strings.

`MemoryOptimizer` ¤

Memory optimization analysis for pipeline functions.

`analyze_pipeline_memory(pipeline_fn, sample_data)` ¤

Analyze memory usage of a pipeline function.

Parameters:

Name	Type	Description	Default
`pipeline_fn`	`Callable[[Any], Any]`	Function to analyze.	required
`sample_data`	`Any`	Sample input data.	required

Returns:

Type	Description
`MemoryAnalysis \| None`	MemoryAnalysis with measurements and suggestions,
`MemoryAnalysis \| None`	or None if the pipeline raises an exception.

calibrax.profiling.gpu¤

HardwareConfig(*, platform, precision, tile_size, critical_batch_size, memory_layout, use_vmem_optimization) dataclass ¤

MemoryAnalysis(*, baseline_memory_mb, peak_memory_mb, peak_usage_mb, retained_memory_mb, memory_efficiency, suggestions=()) dataclass ¤

AdaptiveOperation() ¤

optimize_shapes(*shapes) ¤

GPUMemoryProfiler() ¤

get_memory_usage() ¤

get_utilization() ¤

get_clock_info() ¤

get_power_info() ¤

analyze_memory_pattern(measurements) ¤

MemoryOptimizer ¤

analyze_pipeline_memory(pipeline_fn, sample_data) ¤

`HardwareConfig(*, platform, precision, tile_size, critical_batch_size, memory_layout, use_vmem_optimization)` `dataclass` ¤

`MemoryAnalysis(*, baseline_memory_mb, peak_memory_mb, peak_usage_mb, retained_memory_mb, memory_efficiency, suggestions=())` `dataclass` ¤

`AdaptiveOperation()` ¤

`optimize_shapes(*shapes)` ¤

`GPUMemoryProfiler()` ¤

`get_memory_usage()` ¤

`get_utilization()` ¤

`get_clock_info()` ¤

`get_power_info()` ¤

`analyze_memory_pattern(measurements)` ¤

`MemoryOptimizer` ¤

`analyze_pipeline_memory(pipeline_fn, sample_data)` ¤