model_manager
Model management with dynamic loading, memory budget validation, and LRU eviction.
This module provides a ModelManager class that handles loading and unloading of AI models based on available GPU memory. Models are loaded on demand and automatically evicted when memory pressure occurs.
logging
time
OrderedDict
Path
Any
torch
yaml
trace
logger
tracer
ModelConfig Objects
class ModelConfig()
Configuration for a single model variant.
Attributes
model_id : str Hugging Face model identifier or external API model name. framework : str Inference framework (sglang, vllm, pytorch, external_api). vram_gb : float VRAM requirement in GB (0 for external APIs). quantization : str | None Quantization method (4bit, 8bit, awq, etc). speed : str Speed category (fast, medium, slow). description : str Human-readable description. fps : int | None Processing speed in frames per second (for vision models). provider : str | None External API provider (anthropic, openai, google). api_endpoint : str | None API endpoint URL for external APIs. requires_api_key : bool Whether model requires API key authentication.
__init__
def __init__(config_dict: dict[str, Any]) -> None
Initialize model configuration from dictionary.
Parameters
config_dict : dict[str, Any] Dictionary containing model configuration parameters.
vram_bytes
@property
def vram_bytes() -> int
Convert VRAM requirement from GB to bytes.
Returns
int VRAM requirement in bytes.
TaskConfig Objects
class TaskConfig()
Configuration for a task type with multiple model options.
Attributes
task_name : str Name of the task. selected : str Currently selected model name. options : dict[str, ModelConfig] Available model options for this task.
__init__
def __init__(task_name: str, config_dict: dict[str, Any]) -> None
Initialize task configuration from dictionary.
Parameters
task_name : str Name of the task (e.g., "video_summarization"). config_dict : dict[str, Any] Dictionary containing task configuration.
get_selected_config
def get_selected_config() -> ModelConfig
Get the currently selected model configuration.
Returns
ModelConfig Configuration for the selected model.
InferenceConfig Objects
class InferenceConfig()
Global inference configuration settings.
Attributes
max_memory_per_model : str Maximum memory per model ('auto' or specific value). offload_threshold : float Memory usage threshold for offloading (0.0 to 1.0). warmup_on_startup : bool Whether to load all models on startup. default_batch_size : int Default batch size for inference. max_batch_size : int Maximum batch size for inference.
__init__
def __init__(config_dict: dict[str, Any]) -> None
Initialize inference configuration from dictionary.
Parameters
config_dict : dict[str, Any] Dictionary containing inference configuration.
ModelManager Objects
class ModelManager()
Manages loading, unloading, and memory management of AI models.
This class handles dynamic model loading based on memory availability, implements LRU eviction when memory pressure occurs, and provides utilities for VRAM monitoring.
Attributes
config_path : Path Path to models.yaml configuration file. config : dict[str, Any] Parsed configuration dictionary. loaded_models : OrderedDict[str, Any] Currently loaded models (LRU ordered). model_load_times : dict[str, float] Timestamp when each model was loaded. model_memory_usage : dict[str, int] Actual memory usage per model in bytes. tasks : dict[str, TaskConfig] Task configurations. inference_config : InferenceConfig Global inference settings.
__init__
def __init__(config_path: str) -> None
Initialize ModelManager with configuration file.
Parameters
config_path : str Path to models.yaml configuration file.
get_available_vram
def get_available_vram() -> int
Get available GPU memory in bytes.
Returns:
Available VRAM in bytes
get_total_vram
def get_total_vram() -> int
Get total GPU memory in bytes.
Returns:
Total VRAM in bytes
get_memory_usage_percentage
def get_memory_usage_percentage() -> float
Get current GPU memory usage as percentage.
Returns:
Memory usage percentage (0.0 to 1.0)
check_memory_available
def check_memory_available(required_bytes: int) -> bool
Check if sufficient memory is available for model loading.
Arguments:
required_bytes- Required memory in bytes
Returns:
True if sufficient memory is available
get_lru_model
def get_lru_model() -> str | None
Get least recently used model identifier.
Returns:
Task name of LRU model, or None if no models loaded
evict_lru_model
@tracer.start_as_current_span("evict_lru_model")
async def evict_lru_model() -> str | None
Evict the least recently used model from memory.
Returns:
Task name of evicted model, or None if no models to evict
unload_model
@tracer.start_as_current_span("unload_model")
async def unload_model(task_type: str) -> None
Unload a model from memory.
Arguments:
task_type- Task type of model to unload
load_model
@tracer.start_as_current_span("load_model")
async def load_model(task_type: str) -> Any
Load a model for the specified task type.
This method loads the selected model for the task, handling memory management and eviction if necessary.
Arguments:
task_type- Task type to load model for
Returns:
Loaded model object
Raises:
ValueError- If task type is invalid or model cannot be loadedRuntimeError- If insufficient memory after eviction attempts
get_model
async def get_model(task_type: str) -> Any
Get model for task type, loading if necessary.
Arguments:
task_type- Task type to get model for
Returns:
Loaded model object
get_loaded_models
def get_loaded_models() -> dict[str, dict[str, Any]]
Get information about currently loaded models.
Returns:
Dictionary mapping task types to model information
get_model_config
def get_model_config(task_type: str) -> TaskConfig | None
Get configuration for a task type.
Arguments:
task_type- Task type to get configuration for
Returns:
Task configuration, or None if task type is invalid
set_selected_model
async def set_selected_model(task_type: str, model_name: str) -> None
Change the selected model for a task type.
If the task's model is currently loaded, it will be unloaded and the new model will be loaded.
Arguments:
task_type- Task type to updatemodel_name- Name of model option to select
Raises:
ValueError- If task type or model name is invalid
validate_memory_budget
def validate_memory_budget() -> dict[str, Any]
Validate that all selected models can fit in available memory.
Returns:
Dictionary with validation results
warmup_models
async def warmup_models() -> None
Load all selected models if warmup_on_startup is enabled.
is_external_api
def is_external_api(task_type: str) -> bool
Check if a task uses an external API model.
Parameters
task_type : str Task type to check.
Returns
bool True if task uses external API, False otherwise.
Raises
ValueError If task type is invalid.
get_external_api_config
def get_external_api_config(task_type: str) -> Any
Get external API configuration for a task.
Parameters
task_type : str Task type to get configuration for.
Returns
ExternalAPIConfig Configuration object for external API client.
Raises
ValueError If task type is invalid or doesn't use external API.
shutdown
async def shutdown() -> None
Unload all models and clean up resources.