Fovea Model Service API Reference

This is the auto-generated API documentation for the Fovea Model Service, which provides AI-powered video analysis capabilities including summarization, object detection, and tracking.

Core Modules

Main Application

API Routes

Video Summarization

Video Understanding Models

Vision Language Model loader with support for multiple VLM architectures.

This module provides a unified interface for loading and running inference with various Vision Language Models including Llama 4 Maverick, Gemma 3, InternVL3, Pixtral Large, and Qwen2.5-VL. Models can be loaded with different quantization strategies and inference frameworks (SGLang or vLLM).

class vlm_loader.QuantizationType(*values)[source]

Bases: str, Enum

Supported quantization types for model compression.

NONE = 'none'

FOUR_BIT = '4bit'

EIGHT_BIT = '8bit'

AWQ = 'awq'

class vlm_loader.InferenceFramework(*values)[source]

Bases: str, Enum

Supported inference frameworks for model execution.

SGLANG = 'sglang'

VLLM = 'vllm'

TRANSFORMERS = 'transformers'

class vlm_loader.VLMConfig(model_id, quantization=QuantizationType.FOUR_BIT, framework=InferenceFramework.SGLANG, max_memory_gb=None, device='cuda', trust_remote_code=True)[source]

Bases: object

Configuration for Vision Language Model loading and inference.

Parameters:

model_id (str) – HuggingFace model identifier or local path.
quantization (QuantizationType) – Quantization strategy to apply.
framework (InferenceFramework) – Inference framework to use for model execution.
max_memory_gb (int | None, default=None) – Maximum GPU memory to allocate in GB. If None, uses all available.
device (str, default="cuda") – Device to load the model on.
trust_remote_code (bool, default=True) – Whether to trust remote code from HuggingFace.

model_id: str

quantization: QuantizationType = '4bit'

framework: InferenceFramework = 'sglang'

max_memory_gb: int | None = None

device: str = 'cuda'

trust_remote_code: bool = True

__init__(model_id, quantization=QuantizationType.FOUR_BIT, framework=InferenceFramework.SGLANG, max_memory_gb=None, device='cuda', trust_remote_code=True)

class vlm_loader.VLMLoader(config)[source]

Bases: ABC

Abstract base class for Vision Language Model loaders.

All VLM loaders must implement the load and generate methods.

__init__(config)[source]

Initialize the VLM loader with configuration.

Parameters:: config (VLMConfig) – Configuration for model loading and inference.

abstractmethod load()[source]

Load the model into memory with configured settings.

Raises:: RuntimeError – If model loading fails.
Return type:: None

abstractmethod generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt.

Parameters:

images (list[Image.Image]) – List of PIL images to process.
prompt (str) – Text prompt for the model.
max_new_tokens (int, default=512) – Maximum number of tokens to generate.
temperature (float, default=0.7) – Sampling temperature for generation.

Returns:

Generated text response.

Return type:

str

Raises:

RuntimeError – If generation fails or model is not loaded.

unload()[source]

Unload the model from memory to free GPU resources.

Return type:: None

class vlm_loader.Llama4MaverickLoader(config)[source]

Bases: VLMLoader

Loader for Llama 4 Maverick Vision Language Model.

Llama 4 Maverick is a 400B parameter MoE model with 17B active parameters, supporting multimodal input with 10M context length.

load()[source]

Load Llama 4 Maverick model with configured settings.

Return type:: None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Llama 4 Maverick.

Return type:: str

class vlm_loader.Gemma3Loader(config)[source]

Bases: VLMLoader

Loader for Gemma 3 27B Vision Language Model.

Gemma 3 27B excels at document analysis, OCR, and multilingual tasks with fast inference speed.

load()[source]

Load Gemma 3 model with configured settings.

Return type:: None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Gemma 3.

Return type:: str

class vlm_loader.InternVL3Loader(config)[source]

Bases: VLMLoader

Loader for InternVL3-78B Vision Language Model.

InternVL3-78B achieves state-of-the-art results on vision benchmarks with strong scientific reasoning capabilities.

load()[source]

Load InternVL3 model with configured settings.

Return type:: None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using InternVL3.

Return type:: str

class vlm_loader.PixtralLargeLoader(config)[source]

Bases: VLMLoader

Loader for Pixtral Large Vision Language Model.

Pixtral Large is a 123B parameter model with 128k context length, optimized for batch processing of long documents.

load()[source]

Load Pixtral Large model with configured settings.

Return type:: None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Pixtral Large.

Return type:: str

class vlm_loader.Qwen25VLLoader(config)[source]

Bases: VLMLoader

Loader for Qwen2.5-VL 72B Vision Language Model.

Qwen2.5-VL 72B is a proven stable model with strong performance across vision-language tasks.

load()[source]

Load Qwen2.5-VL model with configured settings.

Return type:: None

generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]

Generate text response from images and prompt using Qwen2.5-VL.

Return type:: str

vlm_loader.create_vlm_loader(model_name, config)[source]

Factory function to create appropriate VLM loader based on model name.

Parameters:

model_name (str) – Name of the model to load. Supported values: - “llama-4-maverick” or “llama4-maverick” - “gemma-3-27b” or “gemma3” - “internvl3-78b” or “internvl3” - “pixtral-large” or “pixtral” - “qwen2.5-vl-72b” or “qwen25vl”
config (VLMConfig) – Configuration for model loading and inference.

Returns:

Appropriate loader instance for the specified model.

Return type:

VLMLoader

Raises:

ValueError – If model_name is not recognized.

Language Models

Configurable LLM loader with multi-model support and quantization.

This module provides a loader for text-only language models with support for multiple model options (Llama 4 Scout, Llama 3.3 70B, DeepSeek V3, Gemma 3), 4-bit quantization with bitsandbytes, SGLang inference framework, and automatic fallback handling.

class llm_loader.LLMFramework(*values)[source]

Bases: str, Enum

Inference framework options for LLM models.

SGLANG = 'sglang'

TRANSFORMERS = 'transformers'

class llm_loader.LLMConfig(model_id, quantization, framework, max_tokens=4096, temperature=0.7, top_p=0.9, context_length=131072)[source]

Bases: object

Configuration for a language model.

Parameters:

model_id (str) – HuggingFace model identifier (e.g., “meta-llama/Llama-4-Scout”).
quantization (str) – Quantization mode (e.g., “4bit”, “8bit”, “none”).
framework (LLMFramework) – Inference framework to use (sglang or transformers).
max_tokens (int, default=4096) – Maximum number of tokens to generate.
temperature (float, default=0.7) – Sampling temperature for generation.
top_p (float, default=0.9) – Nucleus sampling parameter.
context_length (int, default=131072) – Maximum context length in tokens.

model_id: str

quantization: str

framework: LLMFramework

max_tokens: int = 4096

temperature: float = 0.7

top_p: float = 0.9

context_length: int = 131072

__init__(model_id, quantization, framework, max_tokens=4096, temperature=0.7, top_p=0.9, context_length=131072)

class llm_loader.GenerationConfig(max_tokens=4096, temperature=0.7, top_p=0.9, stop_sequences=None)[source]

Bases: object

Configuration for text generation.

Parameters:

max_tokens (int, default=4096) – Maximum number of tokens to generate.
temperature (float, default=0.7) – Sampling temperature (0.0 for greedy, higher for more randomness).
top_p (float, default=0.9) – Nucleus sampling parameter.
stop_sequences (list[str] | None, default=None) – List of sequences that stop generation when encountered.

max_tokens: int = 4096

temperature: float = 0.7

top_p: float = 0.9

stop_sequences: list[str] | None = None

__init__(max_tokens=4096, temperature=0.7, top_p=0.9, stop_sequences=None)

class llm_loader.GenerationResult(text, tokens_used, finish_reason)[source]

Bases: object

Result from text generation.

Parameters:

text (str) – Generated text.
tokens_used (int) – Number of tokens used in generation.
finish_reason (str) – Reason generation stopped (e.g., “length”, “stop_sequence”, “eos”).

text: str

tokens_used: int

finish_reason: str

__init__(text, tokens_used, finish_reason)

class llm_loader.LLMLoader(config, cache_dir=None)[source]

Bases: object

Loader for text-only language models with quantization support.

This class handles loading language models with configurable quantization, supports multiple model options, and provides text generation utilities with error handling and fallback logic.

__init__(config, cache_dir=None)[source]

Initialize the LLM loader.

Parameters:

config (LLMConfig) – Model configuration specifying model ID, quantization, framework.
cache_dir (Path | None, default=None) – Directory for caching model weights. If None, uses default HF cache.

async load()[source]

Load the language model and tokenizer.

This method loads the model with the specified quantization settings and prepares it for inference. Loading is protected by a lock to prevent concurrent loading attempts.

Raises:: RuntimeError – If model loading fails due to memory, invalid model ID, or other issues.
Return type:: None

async generate(prompt, generation_config=None)[source]

Generate text from a prompt using the loaded model.

Parameters:

prompt (str) – Input text prompt for generation.
generation_config (GenerationConfig | None, default=None) – Generation parameters. If None, uses default configuration.

Returns:

Generated text with metadata (tokens used, finish reason).

Return type:

GenerationResult

Raises:

RuntimeError – If model is not loaded or generation fails.

async unload()[source]

Unload the model from memory.

This method releases the model and tokenizer, freeing GPU/CPU memory.

Return type:: None

is_loaded()[source]

Check if the model is currently loaded.

Returns:: True if model and tokenizer are loaded, False otherwise.
Return type:: bool

get_memory_usage()[source]

Get current GPU memory usage for the model.

Returns:: Dictionary with “allocated” and “reserved” memory in bytes. Returns zeros if CUDA is not available.
Return type:: dict[str, int]

llm_loader.create_llm_config_from_dict(model_dict)[source]

Create an LLMConfig from a dictionary (e.g., from YAML).

Parameters:: model_dict (dict[str, Any]) – Dictionary containing model configuration keys.
Returns:: Configured LLMConfig instance.
Return type:: LLMConfig
Raises:: ValueError – If required keys are missing or framework is invalid.

async llm_loader.create_llm_loader_with_fallback(primary_config, fallback_configs, cache_dir=None)[source]

Create an LLM loader with automatic fallback to alternative models.

Parameters:

primary_config (LLMConfig) – Primary model configuration to try first.
fallback_configs (list[LLMConfig]) – List of fallback model configurations to try if primary fails.
cache_dir (Path | None, default=None) – Directory for caching model weights.

Returns:

Successfully loaded LLM loader.

Return type:

LLMLoader

Raises:

RuntimeError – If all model loading attempts fail.

Object Detection

Open-vocabulary object detection with multiple model architectures.

This module provides a unified interface for loading and running inference with various open-vocabulary object detection models including YOLO-World v2.1, Grounding DINO 1.5, OWLv2, and Florence-2. Models support text-based prompts for detecting objects without pre-defined class vocabularies.

class detection_loader.DetectionFramework(*values)[source]

Bases: str, Enum

Supported detection frameworks for model execution.

PYTORCH = 'pytorch'

ULTRALYTICS = 'ultralytics'

TRANSFORMERS = 'transformers'

class detection_loader.DetectionConfig(model_id, framework=DetectionFramework.PYTORCH, confidence_threshold=0.25, device='cuda', cache_dir=None)[source]

Bases: object

Configuration for object detection model loading and inference.

Parameters:

model_id (str) – HuggingFace model identifier or Ultralytics model name.
framework (DetectionFramework) – Framework to use for model execution.
confidence_threshold (float, default=0.25) – Minimum confidence score for detections (0.0 to 1.0).
device (str, default="cuda") – Device to load the model on.
cache_dir (Path | None, default=None) – Directory for caching model weights.

model_id: str

framework: DetectionFramework = 'pytorch'

confidence_threshold: float = 0.25

device: str = 'cuda'

cache_dir: Path | None = None

__init__(model_id, framework=DetectionFramework.PYTORCH, confidence_threshold=0.25, device='cuda', cache_dir=None)

class detection_loader.BoundingBox(x1, y1, x2, y2)[source]

Bases: object

Bounding box in normalized coordinates.

Parameters:

x1 (float) – Left coordinate (0.0 to 1.0, normalized by image width).
y1 (float) – Top coordinate (0.0 to 1.0, normalized by image height).
x2 (float) – Right coordinate (0.0 to 1.0, normalized by image width).
y2 (float) – Bottom coordinate (0.0 to 1.0, normalized by image height).

x1: float

y1: float

x2: float

y2: float

to_absolute(width, height)[source]

Convert normalized coordinates to absolute pixel coordinates.

Parameters:

width (int) – Image width in pixels.
height (int) – Image height in pixels.

Returns:

Bounding box in absolute coordinates (x1, y1, x2, y2).

Return type:

tuple[int, int, int, int]

__init__(x1, y1, x2, y2)

class detection_loader.Detection(bbox, confidence, label)[source]

Bases: object

Single object detection result.

Parameters:

bbox (BoundingBox) – Bounding box in normalized coordinates.
confidence (float) – Detection confidence score (0.0 to 1.0).
label (str) – Detected object class or description.

bbox: BoundingBox

confidence: float

label: str

__init__(bbox, confidence, label)

class detection_loader.DetectionResult(detections, image_width, image_height, processing_time)[source]

Bases: object

Detection results for a single image.

Parameters:

detections (list[Detection]) – List of detected objects with bounding boxes and scores.
image_width (int) – Original image width in pixels.
image_height (int) – Original image height in pixels.
processing_time (float) – Processing time in seconds.

detections: list[Detection]

image_width: int

image_height: int

processing_time: float

__init__(detections, image_width, image_height, processing_time)

class detection_loader.DetectionModelLoader(config)[source]

Bases: ABC

Abstract base class for object detection model loaders.

All detection loaders must implement the load and detect methods.

__init__(config)[source]

Initialize the detection model loader with configuration.

Parameters:: config (DetectionConfig) – Configuration for model loading and inference.

abstractmethod load()[source]

Load the detection model into memory with configured settings.

Raises:: RuntimeError – If model loading fails.
Return type:: None

abstractmethod detect(image, text_prompt)[source]

Detect objects in an image based on text prompt.

Parameters:

image (Image.Image) – PIL Image to process.
text_prompt (str) – Text description of objects to detect (e.g., “person. car. dog.”).

Returns:

Detection results with bounding boxes in normalized coordinates.

Return type:

DetectionResult

Raises:

RuntimeError – If detection fails or model is not loaded.

unload()[source]

Unload the model from memory to free GPU resources.

Return type:: None

class detection_loader.YOLOWorldLoader(config)[source]

Bases: DetectionModelLoader

Loader for YOLO-World v2.1 open-vocabulary detection model.

YOLO-World v2.1 achieves real-time performance (52 FPS) with strong accuracy on open-vocabulary object detection tasks.

load()[source]

Load YOLO-World v2.1 model with configured settings.

Return type:: None

detect(image, text_prompt)[source]

Detect objects using YOLO-World v2.1 with text prompts.

Return type:: DetectionResult

class detection_loader.GroundingDINOLoader(config)[source]

Bases: DetectionModelLoader

Loader for Grounding DINO 1.5 open-vocabulary detection model.

Grounding DINO 1.5 achieves 52.5 AP on COCO with zero-shot open-world object detection capabilities.

load()[source]

Load Grounding DINO 1.5 model with configured settings.

Return type:: None

detect(image, text_prompt)[source]

Detect objects using Grounding DINO 1.5 with text prompts.

Return type:: DetectionResult

class detection_loader.OWLv2Loader(config)[source]

Bases: DetectionModelLoader

Loader for OWLv2 open-vocabulary detection model.

OWLv2 uses scaled training data and achieves strong performance on rare and novel object classes.

load()[source]

Load OWLv2 model with configured settings.

Return type:: None

detect(image, text_prompt)[source]

Detect objects using OWLv2 with text prompts.

Return type:: DetectionResult

class detection_loader.Florence2Loader(config)[source]

Bases: DetectionModelLoader

Loader for Florence-2 unified vision model.

Florence-2 is a 230M parameter model that supports multiple vision tasks including object detection, captioning, and grounding.

load()[source]

Load Florence-2 model with configured settings.

Return type:: None

detect(image, text_prompt)[source]

Detect objects using Florence-2 with text prompts.

Return type:: DetectionResult

detection_loader.create_detection_loader(model_name, config)[source]

Factory function to create appropriate detection loader based on model name.

Parameters:

model_name (str) – Name of the model to load. Supported values: - “yolo-world-v2” or “yoloworld” - “grounding-dino-1-5” or “groundingdino” - “owlv2” or “owl-v2” - “florence-2” or “florence2”
config (DetectionConfig) – Configuration for model loading and inference.

Returns:

Appropriate loader instance for the specified model.

Return type:

DetectionModelLoader

Raises:

ValueError – If model_name is not recognized.

Object Tracking

Video segmentation and tracking with multiple model architectures.

This module provides a unified interface for loading and running inference with various video segmentation and tracking models including SAMURAI, SAM2Long, SAM2.1, and YOLO11n-seg. Models support temporal consistency across frames, occlusion handling, and mask-based segmentation output.

class tracking_loader.TrackingFramework(*values)[source]

Bases: str, Enum

Supported tracking frameworks for model execution.

PYTORCH = 'pytorch'

ULTRALYTICS = 'ultralytics'

SAM2 = 'sam2'

class tracking_loader.TrackingConfig(model_id, framework=TrackingFramework.PYTORCH, device='cuda', cache_dir=None, checkpoint_path=None)[source]

Bases: object

Configuration for video tracking model loading and inference.

Parameters:

model_id (str) – HuggingFace model identifier or model name.
framework (TrackingFramework) – Framework to use for model execution.
device (str, default="cuda") – Device to load the model on.
cache_dir (Path | None, default=None) – Directory for caching model weights.
checkpoint_path (Path | None, default=None) – Path to model checkpoint file if using local weights.

model_id: str

framework: TrackingFramework = 'pytorch'

device: str = 'cuda'

cache_dir: Path | None = None

checkpoint_path: Path | None = None

__init__(model_id, framework=TrackingFramework.PYTORCH, device='cuda', cache_dir=None, checkpoint_path=None)

class tracking_loader.TrackingMask(mask, confidence, object_id)[source]

Bases: object

Segmentation mask for a tracked object.

Parameters:

mask (np.ndarray) – Binary segmentation mask with shape (H, W) where values are 0 or 1.
confidence (float) – Mask prediction confidence score (0.0 to 1.0).
object_id (int) – Unique identifier for the tracked object across frames.

mask: ndarray[Any, dtype[uint8]]

confidence: float

object_id: int

to_rle()[source]

Convert mask to Run-Length Encoding format.

Returns:: RLE-encoded mask with ‘size’ and ‘counts’ keys.
Return type:: dict[str, Any]

__init__(mask, confidence, object_id)

class tracking_loader.TrackingFrame(frame_idx, masks, occlusions, processing_time)[source]

Bases: object

Tracking results for a single video frame.

Parameters:

frame_idx (int) – Zero-indexed frame number in the video sequence.
masks (list[TrackingMask]) – List of segmentation masks for tracked objects in this frame.
occlusions (dict[int, bool]) – Mapping of object_id to occlusion status (True if occluded).
processing_time (float) – Processing time for this frame in seconds.

frame_idx: int

masks: list[TrackingMask]

occlusions: dict[int, bool]

processing_time: float

__init__(frame_idx, masks, occlusions, processing_time)

class tracking_loader.TrackingResult(frames, video_width, video_height, total_processing_time, fps)[source]

Bases: object

Tracking results for a video sequence.

Parameters:

frames (list[TrackingFrame]) – Tracking results for each frame in the sequence.
video_width (int) – Video frame width in pixels.
video_height (int) – Video frame height in pixels.
total_processing_time (float) – Total processing time for all frames in seconds.
fps (float) – Processing speed in frames per second.

frames: list[TrackingFrame]

video_width: int

video_height: int

total_processing_time: float

fps: float

__init__(frames, video_width, video_height, total_processing_time, fps)

class tracking_loader.TrackingModelLoader(config)[source]

Bases: ABC

Abstract base class for video tracking model loaders.

All tracking loaders must implement the load and track methods.

__init__(config)[source]

Initialize the tracking model loader with configuration.

Parameters:: config (TrackingConfig) – Configuration for model loading and inference.

abstractmethod load()[source]

Load the tracking model into memory with configured settings.

Raises:: RuntimeError – If model loading fails.
Return type:: None

abstractmethod track(frames, initial_masks, object_ids)[source]

Track objects across video frames with mask-based segmentation.

Parameters:

frames (list[Image.Image]) – List of PIL Images representing consecutive video frames.
initial_masks (list[np.ndarray]) – Initial segmentation masks for objects in the first frame. Each mask is a binary numpy array with shape (H, W).
object_ids (list[int]) – Unique identifiers for each object to track.

Returns:

Tracking results with segmentation masks for each frame.

Return type:

TrackingResult

Raises:

RuntimeError – If tracking fails or model is not loaded.
ValueError – If number of initial_masks does not match object_ids length.

unload()[source]

Unload the model from memory to free GPU resources.

Return type:: None

class tracking_loader.SAMURAILoader(config)[source]

Bases: TrackingModelLoader

Loader for SAMURAI motion-aware tracking model.

SAMURAI achieves 7.1% better performance than SAM2 baseline with motion-aware tracking and occlusion handling capabilities.

load()[source]

Load SAMURAI model with configured settings.

Return type:: None

track(frames, initial_masks, object_ids)[source]

Track objects using SAMURAI with motion-aware tracking.

Return type:: TrackingResult

class tracking_loader.SAM2LongLoader(config)[source]

Bases: TrackingModelLoader

Loader for SAM2Long long video tracking model.

SAM2Long achieves 5.3% better performance than SAM2 baseline with error accumulation fixes for long video sequences.

load()[source]

Load SAM2Long model with configured settings.

Return type:: None

track(frames, initial_masks, object_ids)[source]

Track objects using SAM2Long with error accumulation fixes.

Return type:: TrackingResult

class tracking_loader.SAM2Loader(config)[source]

Bases: TrackingModelLoader

Loader for SAM2.1 baseline video segmentation model.

SAM2.1 provides baseline performance with proven stability for general video segmentation and tracking tasks.

load()[source]

Load SAM2.1 model with configured settings.

Return type:: None

track(frames, initial_masks, object_ids)[source]

Track objects using SAM2.1 baseline implementation.

Return type:: TrackingResult

class tracking_loader.YOLO11SegLoader(config)[source]

Bases: TrackingModelLoader

Loader for YOLO11n-seg lightweight segmentation model.

YOLO11n-seg is a 2.7M parameter model optimized for real-time segmentation in speed-critical applications.

load()[source]

Load YOLO11n-seg model with configured settings.

Return type:: None

track(frames, initial_masks, object_ids)[source]

Track objects using YOLO11n-seg with per-frame segmentation.

Note: YOLO11n-seg performs independent segmentation per frame without temporal consistency. Object re-identification is based on spatial overlap.

Return type:: TrackingResult

tracking_loader.create_tracking_loader(model_name, config)[source]

Factory function to create appropriate tracking loader based on model name.

Parameters:

model_name (str) – Name of the model to load. Supported values: - “samurai” (default) - “sam2long” or “sam2-long” - “sam2” or “sam2.1” - “yolo11n-seg” or “yolo11seg”
config (TrackingConfig) – Configuration for model loading and inference.

Returns:

Appropriate loader instance for the specified model.

Return type:

TrackingModelLoader

Raises:

ValueError – If model_name is not recognized.

Video Utilities

Video processing utilities for frame extraction and audio processing.

This module provides functions for extracting frames from videos using OpenCV and extracting audio using FFmpeg. It supports various sampling strategies and output formats.

exception video_utils.VideoProcessingError[source]

Bases: Exception

Raised when video processing operations fail.

class video_utils.VideoInfo(path, frame_count, fps, duration, width, height)[source]

Bases: object

Container for video metadata.

path

Path to the video file.

Type:: str

frame_count

Total number of frames in the video.

Type:: int

fps

Frames per second.

Type:: float

duration

Duration in seconds.

Type:: float

width

Frame width in pixels.

Type:: int

height

Frame height in pixels.

Type:: int

__init__(path, frame_count, fps, duration, width, height)[source]

video_utils.get_video_info(video_path)[source]

Extract metadata from a video file.

Parameters:: video_path (str) – Path to the video file.
Returns:: Video metadata object.
Return type:: VideoInfo
Raises:: VideoProcessingError – If the video cannot be opened or read.

video_utils.extract_frame(video_path, frame_number)[source]

Extract a single frame from a video.

Parameters:

video_path (str) – Path to the video file.
frame_number (int) – Frame index to extract (zero-indexed).

Returns:

Frame as numpy array in RGB format.

Return type:

np.ndarray

Raises:

VideoProcessingError – If frame extraction fails.

video_utils.extract_frames_uniform(video_path, num_frames=10, max_dimension=None)[source]

Extract frames uniformly sampled from a video.

Parameters:

video_path (str) – Path to the video file.
num_frames (int, default=10) – Number of frames to extract.
max_dimension (int | None, default=None) – Maximum width or height for resizing (maintains aspect ratio). If None, frames are not resized.

Returns:

List of tuples containing (frame_number, frame_array).

Return type:

list[tuple[int, np.ndarray]]

Raises:

VideoProcessingError – If frame extraction fails.

video_utils.extract_frames_by_rate(video_path, sample_rate=30, max_dimension=None)[source]

Extract frames at a specified sampling rate.

Parameters:

video_path (str) – Path to the video file.
sample_rate (int, default=30) – Extract one frame every N frames.
max_dimension (int | None, default=None) – Maximum width or height for resizing (maintains aspect ratio). If None, frames are not resized.

Returns:

List of tuples containing (frame_number, frame_array).

Return type:

list[tuple[int, np.ndarray]]

Raises:

VideoProcessingError – If frame extraction fails.

video_utils.resize_frame(frame, max_dimension)[source]

Resize a frame maintaining aspect ratio.

Parameters:

frame (np.ndarray) – Input frame as numpy array.
max_dimension (int) – Maximum width or height in pixels.

Returns:

Resized frame.

Return type:

np.ndarray

async video_utils.extract_audio(video_path, output_path=None, sample_rate=16000, channels=1)[source]

Extract audio from a video file using FFmpeg.

Parameters:

video_path (str) – Path to the video file.
output_path (str | None, default=None) – Path for output audio file. If None, creates temp file.
sample_rate (int, default=16000) – Audio sample rate in Hz.
channels (int, default=1) – Number of audio channels (1=mono, 2=stereo).

Returns:

Path to the extracted audio file.

Return type:

str

Raises:

VideoProcessingError – If audio extraction fails.

video_utils.check_ffmpeg_available()[source]

Check if FFmpeg is available in the system PATH.

Returns:: True if FFmpeg is available, False otherwise.
Return type:: bool

Model Management

Model management with dynamic loading, memory budget validation, and LRU eviction.

This module provides a ModelManager class that handles loading and unloading of AI models based on available GPU memory. Models are loaded on demand and automatically evicted when memory pressure occurs.

class model_manager.ModelConfig(config_dict)[source]

Bases: object

Configuration for a single model variant.

model_id

Hugging Face model identifier.

Type:: str

framework

Inference framework (sglang, vllm, pytorch).

Type:: str

vram_gb

VRAM requirement in GB.

Type:: float

quantization

Quantization method (4bit, 8bit, awq, etc).

Type:: str | None

speed

Speed category (fast, medium, slow).

Type:: str

description

Human-readable description.

Type:: str

fps

Processing speed in frames per second (for vision models).

Type:: int | None

__init__(config_dict)[source]

Initialize model configuration from dictionary.

Parameters:: config_dict (dict[str, Any]) – Dictionary containing model configuration parameters.

property vram_bytes: int

Convert VRAM requirement from GB to bytes.

Returns:: VRAM requirement in bytes.
Return type:: int

class model_manager.TaskConfig(task_name, config_dict)[source]

Bases: object

Configuration for a task type with multiple model options.

task_name

Name of the task.

Type:: str

selected

Currently selected model name.

Type:: str

options

Available model options for this task.

Type:: dict[str, ModelConfig]

__init__(task_name, config_dict)[source]

Initialize task configuration from dictionary.

Parameters:

task_name (str) – Name of the task (e.g., “video_summarization”).
config_dict (dict[str, Any]) – Dictionary containing task configuration.

get_selected_config()[source]

Get the currently selected model configuration.

Returns:: Configuration for the selected model.
Return type:: ModelConfig

class model_manager.InferenceConfig(config_dict)[source]

Bases: object

Global inference configuration settings.

max_memory_per_model

Maximum memory per model (‘auto’ or specific value).

Type:: str

offload_threshold

Memory usage threshold for offloading (0.0 to 1.0).

Type:: float

warmup_on_startup

Whether to load all models on startup.

Type:: bool

default_batch_size

Default batch size for inference.

Type:: int

max_batch_size

Maximum batch size for inference.

Type:: int

__init__(config_dict)[source]

Initialize inference configuration from dictionary.

Parameters:: config_dict (dict[str, Any]) – Dictionary containing inference configuration.

class model_manager.ModelManager(config_path)[source]

Bases: object

Manages loading, unloading, and memory management of AI models.

This class handles dynamic model loading based on memory availability, implements LRU eviction when memory pressure occurs, and provides utilities for VRAM monitoring.

config_path

Path to models.yaml configuration file.

Type:: Path

config

Parsed configuration dictionary.

Type:: dict[str, Any]

loaded_models

Currently loaded models (LRU ordered).

Type:: OrderedDict[str, Any]

model_load_times

Timestamp when each model was loaded.

Type:: dict[str, float]

model_memory_usage

Actual memory usage per model in bytes.

Type:: dict[str, int]

tasks

Task configurations.

Type:: dict[str, TaskConfig]

inference_config

Global inference settings.

Type:: InferenceConfig

__init__(config_path)[source]

Initialize ModelManager with configuration file.

Parameters:: config_path (str) – Path to models.yaml configuration file.

get_available_vram()[source]

Get available GPU memory in bytes.

Return type:: int
Returns:: Available VRAM in bytes

get_total_vram()[source]

Get total GPU memory in bytes.

Return type:: int
Returns:: Total VRAM in bytes

get_memory_usage_percentage()[source]

Get current GPU memory usage as percentage.

Return type:: float
Returns:: Memory usage percentage (0.0 to 1.0)

check_memory_available(required_bytes)[source]

Check if sufficient memory is available for model loading.

Parameters:: required_bytes (int) – Required memory in bytes
Return type:: bool
Returns:: True if sufficient memory is available

get_lru_model()[source]

Get least recently used model identifier.

Return type:: str | None
Returns:: Task name of LRU model, or None if no models loaded

async evict_lru_model()[source]

Evict the least recently used model from memory.

Return type:: str | None
Returns:: Task name of evicted model, or None if no models to evict

async unload_model(task_type)[source]

Unload a model from memory.

Parameters:: task_type (str) – Task type of model to unload
Return type:: None

async load_model(task_type)[source]

Load a model for the specified task type.

This method loads the selected model for the task, handling memory management and eviction if necessary.

Parameters:

task_type (str) – Task type to load model for

Return type:

Any

Returns:

Loaded model object

Raises:

ValueError – If task type is invalid or model cannot be loaded
RuntimeError – If insufficient memory after eviction attempts

async get_model(task_type)[source]

Get model for task type, loading if necessary.

Parameters:: task_type (str) – Task type to get model for
Return type:: Any
Returns:: Loaded model object

get_loaded_models()[source]

Get information about currently loaded models.

Return type:: dict[str, dict[str, Any]]
Returns:: Dictionary mapping task types to model information

get_model_config(task_type)[source]

Get configuration for a task type.

Parameters:: task_type (str) – Task type to get configuration for
Return type:: TaskConfig | None
Returns:: Task configuration, or None if task type is invalid

async set_selected_model(task_type, model_name)[source]

Change the selected model for a task type.

If the task’s model is currently loaded, it will be unloaded and the new model will be loaded.

Parameters:

task_type (str) – Task type to update
model_name (str) – Name of model option to select

Raises:

ValueError – If task type or model name is invalid

Return type:

None

validate_memory_budget()[source]

Validate that all selected models can fit in available memory.

Return type:: dict[str, Any]
Returns:: Dictionary with validation results

async warmup_models()[source]

Load all selected models if warmup_on_startup is enabled.

Return type:: None

async shutdown()[source]

Unload all models and clean up resources.

Return type:: None

Fovea Model Service API Reference

Core Modules

Main Application

API Routes

Video Summarization

Video Understanding Models

Language Models

Object Detection

Object Tracking

Video Utilities

Model Management

Indices and tables