Fovea Model Service API Reference

This is the auto-generated API documentation for the Fovea Model Service, which provides AI-powered video analysis capabilities including summarization, object detection, and tracking.

Core Modules

Main Application

API Routes

Video Summarization

Video Understanding Models

Vision Language Model loader with support for multiple VLM architectures.

This module provides a unified interface for loading and running inference with various Vision Language Models including Llama 4 Maverick, Gemma 3, InternVL3, Pixtral Large, and Qwen2.5-VL. Models can be loaded with different quantization strategies and inference frameworks (SGLang or vLLM).

class vlm_loader.QuantizationType(*values)[source]

Bases: str, Enum

Supported quantization types for model compression.

NONE = 'none'
FOUR_BIT = '4bit'
EIGHT_BIT = '8bit'
AWQ = 'awq'
class vlm_loader.InferenceFramework(*values)[source]

Bases: str, Enum

Supported inference frameworks for model execution.

SGLANG = 'sglang'
VLLM = 'vllm'
TRANSFORMERS = 'transformers'
class vlm_loader.VLMConfig(model_id, quantization=QuantizationType.FOUR_BIT, framework=InferenceFramework.SGLANG, max_memory_gb=None, device='cuda', trust_remote_code=True)[source]

Bases: object

Configuration for Vision Language Model loading and inference.

Parameters:
  • model_id (str) – HuggingFace model identifier or local path.

  • quantization (QuantizationType) – Quantization strategy to apply.

  • framework (InferenceFramework) – Inference framework to use for model execution.

  • max_memory_gb (int | None, default=None) – Maximum GPU memory to allocate in GB. If None, uses all available.

  • device (str, default="cuda") – Device to load the model on.

  • trust_remote_code (bool, default=True) – Whether to trust remote code from HuggingFace.

model_id: str
quantization: QuantizationType = '4bit'
framework: InferenceFramework = 'sglang'
max_memory_gb: int | None = None
device: str = 'cuda'
trust_remote_code: bool = True
__init__(model_id, quantization=QuantizationType.FOUR_BIT, framework=InferenceFramework.SGLANG, max_memory_gb=None, device='cuda', trust_remote_code=True)
class vlm_loader.VLMLoader(config)[source]

Bases: ABC

Abstract base class for Vision Language Model loaders.

All VLM loaders must implement the load and generate methods.

__init__(config)[source]

Initialize the VLM loader with configuration.

Parameters:

config (VLMConfig) – Configuration for model loading and inference.

abstractmethod load()[source]

Load the model into memory with configured settings.

Raises:

RuntimeError – If model loading fails.

Return type:

None

abstractmethod generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt.

Parameters:
  • images (list[Image.Image]) – List of PIL images to process.

  • prompt (str) – Text prompt for the model.

  • max_new_tokens (int, default=512) – Maximum number of tokens to generate.

  • temperature (float, default=0.7) – Sampling temperature for generation.

Returns:

Generated text response.

Return type:

str

Raises:

RuntimeError – If generation fails or model is not loaded.

unload()[source]

Unload the model from memory to free GPU resources.

Return type:

None

class vlm_loader.Llama4MaverickLoader(config)[source]

Bases: VLMLoader

Loader for Llama 4 Maverick Vision Language Model.

Llama 4 Maverick is a 400B parameter MoE model with 17B active parameters, supporting multimodal input with 10M context length.

load()[source]

Load Llama 4 Maverick model with configured settings.

Return type:

None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Llama 4 Maverick.

Return type:

str

class vlm_loader.Gemma3Loader(config)[source]

Bases: VLMLoader

Loader for Gemma 3 27B Vision Language Model.

Gemma 3 27B excels at document analysis, OCR, and multilingual tasks with fast inference speed.

load()[source]

Load Gemma 3 model with configured settings.

Return type:

None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Gemma 3.

Return type:

str

class vlm_loader.InternVL3Loader(config)[source]

Bases: VLMLoader

Loader for InternVL3-78B Vision Language Model.

InternVL3-78B achieves state-of-the-art results on vision benchmarks with strong scientific reasoning capabilities.

load()[source]

Load InternVL3 model with configured settings.

Return type:

None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using InternVL3.

Return type:

str

class vlm_loader.PixtralLargeLoader(config)[source]

Bases: VLMLoader

Loader for Pixtral Large Vision Language Model.

Pixtral Large is a 123B parameter model with 128k context length, optimized for batch processing of long documents.

load()[source]

Load Pixtral Large model with configured settings.

Return type:

None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Pixtral Large.

Return type:

str

class vlm_loader.Qwen25VLLoader(config)[source]

Bases: VLMLoader

Loader for Qwen2.5-VL 72B Vision Language Model.

Qwen2.5-VL 72B is a proven stable model with strong performance across vision-language tasks.

load()[source]

Load Qwen2.5-VL model with configured settings.

Return type:

None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Qwen2.5-VL.

Return type:

str

vlm_loader.create_vlm_loader(model_name, config)[source]

Factory function to create appropriate VLM loader based on model name.

Parameters:
  • model_name (str) – Name of the model to load. Supported values: - “llama-4-maverick” or “llama4-maverick” - “gemma-3-27b” or “gemma3” - “internvl3-78b” or “internvl3” - “pixtral-large” or “pixtral” - “qwen2.5-vl-72b” or “qwen25vl”

  • config (VLMConfig) – Configuration for model loading and inference.

Returns:

Appropriate loader instance for the specified model.

Return type:

VLMLoader

Raises:

ValueError – If model_name is not recognized.

Language Models

Configurable LLM loader with multi-model support and quantization.

This module provides a loader for text-only language models with support for multiple model options (Llama 4 Scout, Llama 3.3 70B, DeepSeek V3, Gemma 3), 4-bit quantization with bitsandbytes, SGLang inference framework, and automatic fallback handling.

class llm_loader.LLMFramework(*values)[source]

Bases: str, Enum

Inference framework options for LLM models.

SGLANG = 'sglang'
TRANSFORMERS = 'transformers'
class llm_loader.LLMConfig(model_id, quantization, framework, max_tokens=4096, temperature=0.7, top_p=0.9, context_length=131072)[source]

Bases: object

Configuration for a language model.

Parameters:
  • model_id (str) – HuggingFace model identifier (e.g., “meta-llama/Llama-4-Scout”).

  • quantization (str) – Quantization mode (e.g., “4bit”, “8bit”, “none”).

  • framework (LLMFramework) – Inference framework to use (sglang or transformers).

  • max_tokens (int, default=4096) – Maximum number of tokens to generate.

  • temperature (float, default=0.7) – Sampling temperature for generation.

  • top_p (float, default=0.9) – Nucleus sampling parameter.

  • context_length (int, default=131072) – Maximum context length in tokens.

model_id: str
quantization: str
framework: LLMFramework
max_tokens: int = 4096
temperature: float = 0.7
top_p: float = 0.9
context_length: int = 131072
__init__(model_id, quantization, framework, max_tokens=4096, temperature=0.7, top_p=0.9, context_length=131072)
class llm_loader.GenerationConfig(max_tokens=4096, temperature=0.7, top_p=0.9, stop_sequences=None)[source]

Bases: object

Configuration for text generation.

Parameters:
  • max_tokens (int, default=4096) – Maximum number of tokens to generate.

  • temperature (float, default=0.7) – Sampling temperature (0.0 for greedy, higher for more randomness).

  • top_p (float, default=0.9) – Nucleus sampling parameter.

  • stop_sequences (list[str] | None, default=None) – List of sequences that stop generation when encountered.

max_tokens: int = 4096
temperature: float = 0.7
top_p: float = 0.9
stop_sequences: list[str] | None = None
__init__(max_tokens=4096, temperature=0.7, top_p=0.9, stop_sequences=None)
class llm_loader.GenerationResult(text, tokens_used, finish_reason)[source]

Bases: object

Result from text generation.

Parameters:
  • text (str) – Generated text.

  • tokens_used (int) – Number of tokens used in generation.

  • finish_reason (str) – Reason generation stopped (e.g., “length”, “stop_sequence”, “eos”).

text: str
tokens_used: int
finish_reason: str
__init__(text, tokens_used, finish_reason)
class llm_loader.LLMLoader(config, cache_dir=None)[source]

Bases: object

Loader for text-only language models with quantization support.

This class handles loading language models with configurable quantization, supports multiple model options, and provides text generation utilities with error handling and fallback logic.

__init__(config, cache_dir=None)[source]

Initialize the LLM loader.

Parameters:
  • config (LLMConfig) – Model configuration specifying model ID, quantization, framework.

  • cache_dir (Path | None, default=None) – Directory for caching model weights. If None, uses default HF cache.

async load()[source]

Load the language model and tokenizer.

This method loads the model with the specified quantization settings and prepares it for inference. Loading is protected by a lock to prevent concurrent loading attempts.

Raises:

RuntimeError – If model loading fails due to memory, invalid model ID, or other issues.

Return type:

None

async generate(prompt, generation_config=None)[source]

Generate text from a prompt using the loaded model.

Parameters:
  • prompt (str) – Input text prompt for generation.

  • generation_config (GenerationConfig | None, default=None) – Generation parameters. If None, uses default configuration.

Returns:

Generated text with metadata (tokens used, finish reason).

Return type:

GenerationResult

Raises:

RuntimeError – If model is not loaded or generation fails.

async unload()[source]

Unload the model from memory.

This method releases the model and tokenizer, freeing GPU/CPU memory.

Return type:

None

is_loaded()[source]

Check if the model is currently loaded.

Returns:

True if model and tokenizer are loaded, False otherwise.

Return type:

bool

get_memory_usage()[source]

Get current GPU memory usage for the model.

Returns:

Dictionary with “allocated” and “reserved” memory in bytes. Returns zeros if CUDA is not available.

Return type:

dict[str, int]

llm_loader.create_llm_config_from_dict(model_dict)[source]

Create an LLMConfig from a dictionary (e.g., from YAML).

Parameters:

model_dict (dict[str, Any]) – Dictionary containing model configuration keys.

Returns:

Configured LLMConfig instance.

Return type:

LLMConfig

Raises:

ValueError – If required keys are missing or framework is invalid.

async llm_loader.create_llm_loader_with_fallback(primary_config, fallback_configs, cache_dir=None)[source]

Create an LLM loader with automatic fallback to alternative models.

Parameters:
  • primary_config (LLMConfig) – Primary model configuration to try first.

  • fallback_configs (list[LLMConfig]) – List of fallback model configurations to try if primary fails.

  • cache_dir (Path | None, default=None) – Directory for caching model weights.

Returns:

Successfully loaded LLM loader.

Return type:

LLMLoader

Raises:

RuntimeError – If all model loading attempts fail.

Object Detection

Open-vocabulary object detection with multiple model architectures.

This module provides a unified interface for loading and running inference with various open-vocabulary object detection models including YOLO-World v2.1, Grounding DINO 1.5, OWLv2, and Florence-2. Models support text-based prompts for detecting objects without pre-defined class vocabularies.

class detection_loader.DetectionFramework(*values)[source]

Bases: str, Enum

Supported detection frameworks for model execution.

PYTORCH = 'pytorch'
ULTRALYTICS = 'ultralytics'
TRANSFORMERS = 'transformers'
class detection_loader.DetectionConfig(model_id, framework=DetectionFramework.PYTORCH, confidence_threshold=0.25, device='cuda', cache_dir=None)[source]

Bases: object

Configuration for object detection model loading and inference.

Parameters:
  • model_id (str) – HuggingFace model identifier or Ultralytics model name.

  • framework (DetectionFramework) – Framework to use for model execution.

  • confidence_threshold (float, default=0.25) – Minimum confidence score for detections (0.0 to 1.0).

  • device (str, default="cuda") – Device to load the model on.

  • cache_dir (Path | None, default=None) – Directory for caching model weights.

model_id: str
framework: DetectionFramework = 'pytorch'
confidence_threshold: float = 0.25
device: str = 'cuda'
cache_dir: Path | None = None
__init__(model_id, framework=DetectionFramework.PYTORCH, confidence_threshold=0.25, device='cuda', cache_dir=None)
class detection_loader.BoundingBox(x1, y1, x2, y2)[source]

Bases: object

Bounding box in normalized coordinates.

Parameters:
  • x1 (float) – Left coordinate (0.0 to 1.0, normalized by image width).

  • y1 (float) – Top coordinate (0.0 to 1.0, normalized by image height).

  • x2 (float) – Right coordinate (0.0 to 1.0, normalized by image width).

  • y2 (float) – Bottom coordinate (0.0 to 1.0, normalized by image height).

x1: float
y1: float
x2: float
y2: float
to_absolute(width, height)[source]

Convert normalized coordinates to absolute pixel coordinates.

Parameters:
  • width (int) – Image width in pixels.

  • height (int) – Image height in pixels.

Returns:

Bounding box in absolute coordinates (x1, y1, x2, y2).

Return type:

tuple[int, int, int, int]

__init__(x1, y1, x2, y2)
class detection_loader.Detection(bbox, confidence, label)[source]

Bases: object

Single object detection result.

Parameters:
  • bbox (BoundingBox) – Bounding box in normalized coordinates.

  • confidence (float) – Detection confidence score (0.0 to 1.0).

  • label (str) – Detected object class or description.

bbox: BoundingBox
confidence: float
label: str
__init__(bbox, confidence, label)
class detection_loader.DetectionResult(detections, image_width, image_height, processing_time)[source]

Bases: object

Detection results for a single image.

Parameters:
  • detections (list[Detection]) – List of detected objects with bounding boxes and scores.

  • image_width (int) – Original image width in pixels.

  • image_height (int) – Original image height in pixels.

  • processing_time (float) – Processing time in seconds.

detections: list[Detection]
image_width: int
image_height: int
processing_time: float
__init__(detections, image_width, image_height, processing_time)
class detection_loader.DetectionModelLoader(config)[source]

Bases: ABC

Abstract base class for object detection model loaders.

All detection loaders must implement the load and detect methods.

__init__(config)[source]

Initialize the detection model loader with configuration.

Parameters:

config (DetectionConfig) – Configuration for model loading and inference.

abstractmethod load()[source]

Load the detection model into memory with configured settings.

Raises:

RuntimeError – If model loading fails.

Return type:

None

abstractmethod detect(image, text_prompt)[source]

Detect objects in an image based on text prompt.

Parameters:
  • image (Image.Image) – PIL Image to process.

  • text_prompt (str) – Text description of objects to detect (e.g., “person. car. dog.”).

Returns:

Detection results with bounding boxes in normalized coordinates.

Return type:

DetectionResult

Raises:

RuntimeError – If detection fails or model is not loaded.

unload()[source]

Unload the model from memory to free GPU resources.

Return type:

None

class detection_loader.YOLOWorldLoader(config)[source]

Bases: DetectionModelLoader

Loader for YOLO-World v2.1 open-vocabulary detection model.

YOLO-World v2.1 achieves real-time performance (52 FPS) with strong accuracy on open-vocabulary object detection tasks.

load()[source]

Load YOLO-World v2.1 model with configured settings.

Return type:

None

detect(image, text_prompt)[source]

Detect objects using YOLO-World v2.1 with text prompts.

Return type:

DetectionResult

class detection_loader.GroundingDINOLoader(config)[source]

Bases: DetectionModelLoader

Loader for Grounding DINO 1.5 open-vocabulary detection model.

Grounding DINO 1.5 achieves 52.5 AP on COCO with zero-shot open-world object detection capabilities.

load()[source]

Load Grounding DINO 1.5 model with configured settings.

Return type:

None

detect(image, text_prompt)[source]

Detect objects using Grounding DINO 1.5 with text prompts.

Return type:

DetectionResult

class detection_loader.OWLv2Loader(config)[source]

Bases: DetectionModelLoader

Loader for OWLv2 open-vocabulary detection model.

OWLv2 uses scaled training data and achieves strong performance on rare and novel object classes.

load()[source]

Load OWLv2 model with configured settings.

Return type:

None

detect(image, text_prompt)[source]

Detect objects using OWLv2 with text prompts.

Return type:

DetectionResult

class detection_loader.Florence2Loader(config)[source]

Bases: DetectionModelLoader

Loader for Florence-2 unified vision model.

Florence-2 is a 230M parameter model that supports multiple vision tasks including object detection, captioning, and grounding.

load()[source]

Load Florence-2 model with configured settings.

Return type:

None

detect(image, text_prompt)[source]

Detect objects using Florence-2 with text prompts.

Return type:

DetectionResult

detection_loader.create_detection_loader(model_name, config)[source]

Factory function to create appropriate detection loader based on model name.

Parameters:
  • model_name (str) – Name of the model to load. Supported values: - “yolo-world-v2” or “yoloworld” - “grounding-dino-1-5” or “groundingdino” - “owlv2” or “owl-v2” - “florence-2” or “florence2”

  • config (DetectionConfig) – Configuration for model loading and inference.

Returns:

Appropriate loader instance for the specified model.

Return type:

DetectionModelLoader

Raises:

ValueError – If model_name is not recognized.

Object Tracking

Video segmentation and tracking with multiple model architectures.

This module provides a unified interface for loading and running inference with various video segmentation and tracking models including SAMURAI, SAM2Long, SAM2.1, and YOLO11n-seg. Models support temporal consistency across frames, occlusion handling, and mask-based segmentation output.

class tracking_loader.TrackingFramework(*values)[source]

Bases: str, Enum

Supported tracking frameworks for model execution.

PYTORCH = 'pytorch'
ULTRALYTICS = 'ultralytics'
SAM2 = 'sam2'
class tracking_loader.TrackingConfig(model_id, framework=TrackingFramework.PYTORCH, device='cuda', cache_dir=None, checkpoint_path=None)[source]

Bases: object

Configuration for video tracking model loading and inference.

Parameters:
  • model_id (str) – HuggingFace model identifier or model name.

  • framework (TrackingFramework) – Framework to use for model execution.

  • device (str, default="cuda") – Device to load the model on.

  • cache_dir (Path | None, default=None) – Directory for caching model weights.

  • checkpoint_path (Path | None, default=None) – Path to model checkpoint file if using local weights.

model_id: str
framework: TrackingFramework = 'pytorch'
device: str = 'cuda'
cache_dir: Path | None = None
checkpoint_path: Path | None = None
__init__(model_id, framework=TrackingFramework.PYTORCH, device='cuda', cache_dir=None, checkpoint_path=None)
class tracking_loader.TrackingMask(mask, confidence, object_id)[source]

Bases: object

Segmentation mask for a tracked object.

Parameters:
  • mask (np.ndarray) – Binary segmentation mask with shape (H, W) where values are 0 or 1.

  • confidence (float) – Mask prediction confidence score (0.0 to 1.0).

  • object_id (int) – Unique identifier for the tracked object across frames.

mask: ndarray[Any, dtype[uint8]]
confidence: float
object_id: int
to_rle()[source]

Convert mask to Run-Length Encoding format.

Returns:

RLE-encoded mask with ‘size’ and ‘counts’ keys.

Return type:

dict[str, Any]

__init__(mask, confidence, object_id)
class tracking_loader.TrackingFrame(frame_idx, masks, occlusions, processing_time)[source]

Bases: object

Tracking results for a single video frame.

Parameters:
  • frame_idx (int) – Zero-indexed frame number in the video sequence.

  • masks (list[TrackingMask]) – List of segmentation masks for tracked objects in this frame.

  • occlusions (dict[int, bool]) – Mapping of object_id to occlusion status (True if occluded).

  • processing_time (float) – Processing time for this frame in seconds.

frame_idx: int
masks: list[TrackingMask]
occlusions: dict[int, bool]
processing_time: float
__init__(frame_idx, masks, occlusions, processing_time)
class tracking_loader.TrackingResult(frames, video_width, video_height, total_processing_time, fps)[source]

Bases: object

Tracking results for a video sequence.

Parameters:
  • frames (list[TrackingFrame]) – Tracking results for each frame in the sequence.

  • video_width (int) – Video frame width in pixels.

  • video_height (int) – Video frame height in pixels.

  • total_processing_time (float) – Total processing time for all frames in seconds.

  • fps (float) – Processing speed in frames per second.

frames: list[TrackingFrame]
video_width: int
video_height: int
total_processing_time: float
fps: float
__init__(frames, video_width, video_height, total_processing_time, fps)
class tracking_loader.TrackingModelLoader(config)[source]

Bases: ABC

Abstract base class for video tracking model loaders.

All tracking loaders must implement the load and track methods.

__init__(config)[source]

Initialize the tracking model loader with configuration.

Parameters:

config (TrackingConfig) – Configuration for model loading and inference.

abstractmethod load()[source]

Load the tracking model into memory with configured settings.

Raises:

RuntimeError – If model loading fails.

Return type:

None

abstractmethod track(frames, initial_masks, object_ids)[source]

Track objects across video frames with mask-based segmentation.

Parameters:
  • frames (list[Image.Image]) – List of PIL Images representing consecutive video frames.

  • initial_masks (list[np.ndarray]) – Initial segmentation masks for objects in the first frame. Each mask is a binary numpy array with shape (H, W).

  • object_ids (list[int]) – Unique identifiers for each object to track.

Returns:

Tracking results with segmentation masks for each frame.

Return type:

TrackingResult

Raises:
  • RuntimeError – If tracking fails or model is not loaded.

  • ValueError – If number of initial_masks does not match object_ids length.

unload()[source]

Unload the model from memory to free GPU resources.

Return type:

None

class tracking_loader.SAMURAILoader(config)[source]

Bases: TrackingModelLoader

Loader for SAMURAI motion-aware tracking model.

SAMURAI achieves 7.1% better performance than SAM2 baseline with motion-aware tracking and occlusion handling capabilities.

load()[source]

Load SAMURAI model with configured settings.

Return type:

None

track(frames, initial_masks, object_ids)[source]

Track objects using SAMURAI with motion-aware tracking.

Return type:

TrackingResult

class tracking_loader.SAM2LongLoader(config)[source]

Bases: TrackingModelLoader

Loader for SAM2Long long video tracking model.

SAM2Long achieves 5.3% better performance than SAM2 baseline with error accumulation fixes for long video sequences.

load()[source]

Load SAM2Long model with configured settings.

Return type:

None

track(frames, initial_masks, object_ids)[source]

Track objects using SAM2Long with error accumulation fixes.

Return type:

TrackingResult

class tracking_loader.SAM2Loader(config)[source]

Bases: TrackingModelLoader

Loader for SAM2.1 baseline video segmentation model.

SAM2.1 provides baseline performance with proven stability for general video segmentation and tracking tasks.

load()[source]

Load SAM2.1 model with configured settings.

Return type:

None

track(frames, initial_masks, object_ids)[source]

Track objects using SAM2.1 baseline implementation.

Return type:

TrackingResult

class tracking_loader.YOLO11SegLoader(config)[source]

Bases: TrackingModelLoader

Loader for YOLO11n-seg lightweight segmentation model.

YOLO11n-seg is a 2.7M parameter model optimized for real-time segmentation in speed-critical applications.

load()[source]

Load YOLO11n-seg model with configured settings.

Return type:

None

track(frames, initial_masks, object_ids)[source]

Track objects using YOLO11n-seg with per-frame segmentation.

Note: YOLO11n-seg performs independent segmentation per frame without temporal consistency. Object re-identification is based on spatial overlap.

Return type:

TrackingResult

tracking_loader.create_tracking_loader(model_name, config)[source]

Factory function to create appropriate tracking loader based on model name.

Parameters:
  • model_name (str) – Name of the model to load. Supported values: - “samurai” (default) - “sam2long” or “sam2-long” - “sam2” or “sam2.1” - “yolo11n-seg” or “yolo11seg”

  • config (TrackingConfig) – Configuration for model loading and inference.

Returns:

Appropriate loader instance for the specified model.

Return type:

TrackingModelLoader

Raises:

ValueError – If model_name is not recognized.

Video Utilities

Video processing utilities for frame extraction and audio processing.

This module provides functions for extracting frames from videos using OpenCV and extracting audio using FFmpeg. It supports various sampling strategies and output formats.

exception video_utils.VideoProcessingError[source]

Bases: Exception

Raised when video processing operations fail.

class video_utils.VideoInfo(path, frame_count, fps, duration, width, height)[source]

Bases: object

Container for video metadata.

path

Path to the video file.

Type:

str

frame_count

Total number of frames in the video.

Type:

int

fps

Frames per second.

Type:

float

duration

Duration in seconds.

Type:

float

width

Frame width in pixels.

Type:

int

height

Frame height in pixels.

Type:

int

__init__(path, frame_count, fps, duration, width, height)[source]
video_utils.get_video_info(video_path)[source]

Extract metadata from a video file.

Parameters:

video_path (str) – Path to the video file.

Returns:

Video metadata object.

Return type:

VideoInfo

Raises:

VideoProcessingError – If the video cannot be opened or read.

video_utils.extract_frame(video_path, frame_number)[source]

Extract a single frame from a video.

Parameters:
  • video_path (str) – Path to the video file.

  • frame_number (int) – Frame index to extract (zero-indexed).

Returns:

Frame as numpy array in RGB format.

Return type:

np.ndarray

Raises:

VideoProcessingError – If frame extraction fails.

video_utils.extract_frames_uniform(video_path, num_frames=10, max_dimension=None)[source]

Extract frames uniformly sampled from a video.

Parameters:
  • video_path (str) – Path to the video file.

  • num_frames (int, default=10) – Number of frames to extract.

  • max_dimension (int | None, default=None) – Maximum width or height for resizing (maintains aspect ratio). If None, frames are not resized.

Returns:

List of tuples containing (frame_number, frame_array).

Return type:

list[tuple[int, np.ndarray]]

Raises:

VideoProcessingError – If frame extraction fails.

video_utils.extract_frames_by_rate(video_path, sample_rate=30, max_dimension=None)[source]

Extract frames at a specified sampling rate.

Parameters:
  • video_path (str) – Path to the video file.

  • sample_rate (int, default=30) – Extract one frame every N frames.

  • max_dimension (int | None, default=None) – Maximum width or height for resizing (maintains aspect ratio). If None, frames are not resized.

Returns:

List of tuples containing (frame_number, frame_array).

Return type:

list[tuple[int, np.ndarray]]

Raises:

VideoProcessingError – If frame extraction fails.

video_utils.resize_frame(frame, max_dimension)[source]

Resize a frame maintaining aspect ratio.

Parameters:
  • frame (np.ndarray) – Input frame as numpy array.

  • max_dimension (int) – Maximum width or height in pixels.

Returns:

Resized frame.

Return type:

np.ndarray

async video_utils.extract_audio(video_path, output_path=None, sample_rate=16000, channels=1)[source]

Extract audio from a video file using FFmpeg.

Parameters:
  • video_path (str) – Path to the video file.

  • output_path (str | None, default=None) – Path for output audio file. If None, creates temp file.

  • sample_rate (int, default=16000) – Audio sample rate in Hz.

  • channels (int, default=1) – Number of audio channels (1=mono, 2=stereo).

Returns:

Path to the extracted audio file.

Return type:

str

Raises:

VideoProcessingError – If audio extraction fails.

video_utils.check_ffmpeg_available()[source]

Check if FFmpeg is available in the system PATH.

Returns:

True if FFmpeg is available, False otherwise.

Return type:

bool

Model Management

Model management with dynamic loading, memory budget validation, and LRU eviction.

This module provides a ModelManager class that handles loading and unloading of AI models based on available GPU memory. Models are loaded on demand and automatically evicted when memory pressure occurs.

class model_manager.ModelConfig(config_dict)[source]

Bases: object

Configuration for a single model variant.

model_id

Hugging Face model identifier.

Type:

str

framework

Inference framework (sglang, vllm, pytorch).

Type:

str

vram_gb

VRAM requirement in GB.

Type:

float

quantization

Quantization method (4bit, 8bit, awq, etc).

Type:

str | None

speed

Speed category (fast, medium, slow).

Type:

str

description

Human-readable description.

Type:

str

fps

Processing speed in frames per second (for vision models).

Type:

int | None

__init__(config_dict)[source]

Initialize model configuration from dictionary.

Parameters:

config_dict (dict[str, Any]) – Dictionary containing model configuration parameters.

property vram_bytes: int

Convert VRAM requirement from GB to bytes.

Returns:

VRAM requirement in bytes.

Return type:

int

class model_manager.TaskConfig(task_name, config_dict)[source]

Bases: object

Configuration for a task type with multiple model options.

task_name

Name of the task.

Type:

str

selected

Currently selected model name.

Type:

str

options

Available model options for this task.

Type:

dict[str, ModelConfig]

__init__(task_name, config_dict)[source]

Initialize task configuration from dictionary.

Parameters:
  • task_name (str) – Name of the task (e.g., “video_summarization”).

  • config_dict (dict[str, Any]) – Dictionary containing task configuration.

get_selected_config()[source]

Get the currently selected model configuration.

Returns:

Configuration for the selected model.

Return type:

ModelConfig

class model_manager.InferenceConfig(config_dict)[source]

Bases: object

Global inference configuration settings.

max_memory_per_model

Maximum memory per model (‘auto’ or specific value).

Type:

str

offload_threshold

Memory usage threshold for offloading (0.0 to 1.0).

Type:

float

warmup_on_startup

Whether to load all models on startup.

Type:

bool

default_batch_size

Default batch size for inference.

Type:

int

max_batch_size

Maximum batch size for inference.

Type:

int

__init__(config_dict)[source]

Initialize inference configuration from dictionary.

Parameters:

config_dict (dict[str, Any]) – Dictionary containing inference configuration.

class model_manager.ModelManager(config_path)[source]

Bases: object

Manages loading, unloading, and memory management of AI models.

This class handles dynamic model loading based on memory availability, implements LRU eviction when memory pressure occurs, and provides utilities for VRAM monitoring.

config_path

Path to models.yaml configuration file.

Type:

Path

config

Parsed configuration dictionary.

Type:

dict[str, Any]

loaded_models

Currently loaded models (LRU ordered).

Type:

OrderedDict[str, Any]

model_load_times

Timestamp when each model was loaded.

Type:

dict[str, float]

model_memory_usage

Actual memory usage per model in bytes.

Type:

dict[str, int]

tasks

Task configurations.

Type:

dict[str, TaskConfig]

inference_config

Global inference settings.

Type:

InferenceConfig

__init__(config_path)[source]

Initialize ModelManager with configuration file.

Parameters:

config_path (str) – Path to models.yaml configuration file.

get_available_vram()[source]

Get available GPU memory in bytes.

Return type:

int

Returns:

Available VRAM in bytes

get_total_vram()[source]

Get total GPU memory in bytes.

Return type:

int

Returns:

Total VRAM in bytes

get_memory_usage_percentage()[source]

Get current GPU memory usage as percentage.

Return type:

float

Returns:

Memory usage percentage (0.0 to 1.0)

check_memory_available(required_bytes)[source]

Check if sufficient memory is available for model loading.

Parameters:

required_bytes (int) – Required memory in bytes

Return type:

bool

Returns:

True if sufficient memory is available

get_lru_model()[source]

Get least recently used model identifier.

Return type:

str | None

Returns:

Task name of LRU model, or None if no models loaded

async evict_lru_model()[source]

Evict the least recently used model from memory.

Return type:

str | None

Returns:

Task name of evicted model, or None if no models to evict

async unload_model(task_type)[source]

Unload a model from memory.

Parameters:

task_type (str) – Task type of model to unload

Return type:

None

async load_model(task_type)[source]

Load a model for the specified task type.

This method loads the selected model for the task, handling memory management and eviction if necessary.

Parameters:

task_type (str) – Task type to load model for

Return type:

Any

Returns:

Loaded model object

Raises:
  • ValueError – If task type is invalid or model cannot be loaded

  • RuntimeError – If insufficient memory after eviction attempts

async get_model(task_type)[source]

Get model for task type, loading if necessary.

Parameters:

task_type (str) – Task type to get model for

Return type:

Any

Returns:

Loaded model object

get_loaded_models()[source]

Get information about currently loaded models.

Return type:

dict[str, dict[str, Any]]

Returns:

Dictionary mapping task types to model information

get_model_config(task_type)[source]

Get configuration for a task type.

Parameters:

task_type (str) – Task type to get configuration for

Return type:

TaskConfig | None

Returns:

Task configuration, or None if task type is invalid

async set_selected_model(task_type, model_name)[source]

Change the selected model for a task type.

If the task’s model is currently loaded, it will be unloaded and the new model will be loaded.

Parameters:
  • task_type (str) – Task type to update

  • model_name (str) – Name of model option to select

Raises:

ValueError – If task type or model name is invalid

Return type:

None

validate_memory_budget()[source]

Validate that all selected models can fit in available memory.

Return type:

dict[str, Any]

Returns:

Dictionary with validation results

async warmup_models()[source]

Load all selected models if warmup_on_startup is enabled.

Return type:

None

async shutdown()[source]

Unload all models and clean up resources.

Return type:

None

Indices and tables