Fovea Model Service API Reference
This is the auto-generated API documentation for the Fovea Model Service, which provides AI-powered video analysis capabilities including summarization, object detection, and tracking.
Core Modules
Main Application
API Routes
Video Summarization
Video Understanding Models
Vision Language Model loader with support for multiple VLM architectures.
This module provides a unified interface for loading and running inference with various Vision Language Models including Llama 4 Maverick, Gemma 3, InternVL3, Pixtral Large, and Qwen2.5-VL. Models can be loaded with different quantization strategies and inference frameworks (SGLang or vLLM).
- class vlm_loader.QuantizationType(*values)[source]
-
Supported quantization types for model compression.
- NONE = 'none'
- FOUR_BIT = '4bit'
- EIGHT_BIT = '8bit'
- AWQ = 'awq'
- class vlm_loader.InferenceFramework(*values)[source]
-
Supported inference frameworks for model execution.
- SGLANG = 'sglang'
- VLLM = 'vllm'
- TRANSFORMERS = 'transformers'
- class vlm_loader.VLMConfig(model_id, quantization=QuantizationType.FOUR_BIT, framework=InferenceFramework.SGLANG, max_memory_gb=None, device='cuda', trust_remote_code=True)[source]
Bases:
object
Configuration for Vision Language Model loading and inference.
- Parameters:
model_id (str) – HuggingFace model identifier or local path.
quantization (QuantizationType) – Quantization strategy to apply.
framework (InferenceFramework) – Inference framework to use for model execution.
max_memory_gb (int | None, default=None) – Maximum GPU memory to allocate in GB. If None, uses all available.
device (str, default="cuda") – Device to load the model on.
trust_remote_code (bool, default=True) – Whether to trust remote code from HuggingFace.
-
quantization:
QuantizationType
= '4bit'
-
framework:
InferenceFramework
= 'sglang'
- __init__(model_id, quantization=QuantizationType.FOUR_BIT, framework=InferenceFramework.SGLANG, max_memory_gb=None, device='cuda', trust_remote_code=True)
- class vlm_loader.VLMLoader(config)[source]
Bases:
ABC
Abstract base class for Vision Language Model loaders.
All VLM loaders must implement the load and generate methods.
- __init__(config)[source]
Initialize the VLM loader with configuration.
- Parameters:
config (VLMConfig) – Configuration for model loading and inference.
- abstractmethod load()[source]
Load the model into memory with configured settings.
- Raises:
RuntimeError – If model loading fails.
- Return type:
- abstractmethod generate(images, prompt, max_new_tokens=512, temperature=0.7)[source]
Generate text response from images and prompt.
- Parameters:
- Returns:
Generated text response.
- Return type:
- Raises:
RuntimeError – If generation fails or model is not loaded.
- class vlm_loader.Llama4MaverickLoader(config)[source]
Bases:
VLMLoader
Loader for Llama 4 Maverick Vision Language Model.
Llama 4 Maverick is a 400B parameter MoE model with 17B active parameters, supporting multimodal input with 10M context length.
- class vlm_loader.Gemma3Loader(config)[source]
Bases:
VLMLoader
Loader for Gemma 3 27B Vision Language Model.
Gemma 3 27B excels at document analysis, OCR, and multilingual tasks with fast inference speed.
- class vlm_loader.InternVL3Loader(config)[source]
Bases:
VLMLoader
Loader for InternVL3-78B Vision Language Model.
InternVL3-78B achieves state-of-the-art results on vision benchmarks with strong scientific reasoning capabilities.
- class vlm_loader.PixtralLargeLoader(config)[source]
Bases:
VLMLoader
Loader for Pixtral Large Vision Language Model.
Pixtral Large is a 123B parameter model with 128k context length, optimized for batch processing of long documents.
- class vlm_loader.Qwen25VLLoader(config)[source]
Bases:
VLMLoader
Loader for Qwen2.5-VL 72B Vision Language Model.
Qwen2.5-VL 72B is a proven stable model with strong performance across vision-language tasks.
- vlm_loader.create_vlm_loader(model_name, config)[source]
Factory function to create appropriate VLM loader based on model name.
- Parameters:
- Returns:
Appropriate loader instance for the specified model.
- Return type:
- Raises:
ValueError – If model_name is not recognized.
Language Models
Configurable LLM loader with multi-model support and quantization.
This module provides a loader for text-only language models with support for multiple model options (Llama 4 Scout, Llama 3.3 70B, DeepSeek V3, Gemma 3), 4-bit quantization with bitsandbytes, SGLang inference framework, and automatic fallback handling.
- class llm_loader.LLMFramework(*values)[source]
-
Inference framework options for LLM models.
- SGLANG = 'sglang'
- TRANSFORMERS = 'transformers'
- class llm_loader.LLMConfig(model_id, quantization, framework, max_tokens=4096, temperature=0.7, top_p=0.9, context_length=131072)[source]
Bases:
object
Configuration for a language model.
- Parameters:
model_id (str) – HuggingFace model identifier (e.g., “meta-llama/Llama-4-Scout”).
quantization (str) – Quantization mode (e.g., “4bit”, “8bit”, “none”).
framework (LLMFramework) – Inference framework to use (sglang or transformers).
max_tokens (int, default=4096) – Maximum number of tokens to generate.
temperature (float, default=0.7) – Sampling temperature for generation.
top_p (float, default=0.9) – Nucleus sampling parameter.
context_length (int, default=131072) – Maximum context length in tokens.
-
framework:
LLMFramework
- __init__(model_id, quantization, framework, max_tokens=4096, temperature=0.7, top_p=0.9, context_length=131072)
- class llm_loader.GenerationConfig(max_tokens=4096, temperature=0.7, top_p=0.9, stop_sequences=None)[source]
Bases:
object
Configuration for text generation.
- Parameters:
max_tokens (int, default=4096) – Maximum number of tokens to generate.
temperature (float, default=0.7) – Sampling temperature (0.0 for greedy, higher for more randomness).
top_p (float, default=0.9) – Nucleus sampling parameter.
stop_sequences (list[str] | None, default=None) – List of sequences that stop generation when encountered.
- __init__(max_tokens=4096, temperature=0.7, top_p=0.9, stop_sequences=None)
- class llm_loader.GenerationResult(text, tokens_used, finish_reason)[source]
Bases:
object
Result from text generation.
- Parameters:
- __init__(text, tokens_used, finish_reason)
- class llm_loader.LLMLoader(config, cache_dir=None)[source]
Bases:
object
Loader for text-only language models with quantization support.
This class handles loading language models with configurable quantization, supports multiple model options, and provides text generation utilities with error handling and fallback logic.
- __init__(config, cache_dir=None)[source]
Initialize the LLM loader.
- Parameters:
config (LLMConfig) – Model configuration specifying model ID, quantization, framework.
cache_dir (Path | None, default=None) – Directory for caching model weights. If None, uses default HF cache.
- async load()[source]
Load the language model and tokenizer.
This method loads the model with the specified quantization settings and prepares it for inference. Loading is protected by a lock to prevent concurrent loading attempts.
- Raises:
RuntimeError – If model loading fails due to memory, invalid model ID, or other issues.
- Return type:
- async generate(prompt, generation_config=None)[source]
Generate text from a prompt using the loaded model.
- Parameters:
prompt (str) – Input text prompt for generation.
generation_config (GenerationConfig | None, default=None) – Generation parameters. If None, uses default configuration.
- Returns:
Generated text with metadata (tokens used, finish reason).
- Return type:
- Raises:
RuntimeError – If model is not loaded or generation fails.
- async unload()[source]
Unload the model from memory.
This method releases the model and tokenizer, freeing GPU/CPU memory.
- Return type:
- llm_loader.create_llm_config_from_dict(model_dict)[source]
Create an LLMConfig from a dictionary (e.g., from YAML).
- Parameters:
model_dict (dict[str, Any]) – Dictionary containing model configuration keys.
- Returns:
Configured LLMConfig instance.
- Return type:
- Raises:
ValueError – If required keys are missing or framework is invalid.
- async llm_loader.create_llm_loader_with_fallback(primary_config, fallback_configs, cache_dir=None)[source]
Create an LLM loader with automatic fallback to alternative models.
- Parameters:
- Returns:
Successfully loaded LLM loader.
- Return type:
- Raises:
RuntimeError – If all model loading attempts fail.
Object Detection
Open-vocabulary object detection with multiple model architectures.
This module provides a unified interface for loading and running inference with various open-vocabulary object detection models including YOLO-World v2.1, Grounding DINO 1.5, OWLv2, and Florence-2. Models support text-based prompts for detecting objects without pre-defined class vocabularies.
- class detection_loader.DetectionFramework(*values)[source]
-
Supported detection frameworks for model execution.
- PYTORCH = 'pytorch'
- ULTRALYTICS = 'ultralytics'
- TRANSFORMERS = 'transformers'
- class detection_loader.DetectionConfig(model_id, framework=DetectionFramework.PYTORCH, confidence_threshold=0.25, device='cuda', cache_dir=None)[source]
Bases:
object
Configuration for object detection model loading and inference.
- Parameters:
model_id (str) – HuggingFace model identifier or Ultralytics model name.
framework (DetectionFramework) – Framework to use for model execution.
confidence_threshold (float, default=0.25) – Minimum confidence score for detections (0.0 to 1.0).
device (str, default="cuda") – Device to load the model on.
cache_dir (Path | None, default=None) – Directory for caching model weights.
-
framework:
DetectionFramework
= 'pytorch'
- __init__(model_id, framework=DetectionFramework.PYTORCH, confidence_threshold=0.25, device='cuda', cache_dir=None)
- class detection_loader.BoundingBox(x1, y1, x2, y2)[source]
Bases:
object
Bounding box in normalized coordinates.
- Parameters:
- __init__(x1, y1, x2, y2)
- class detection_loader.Detection(bbox, confidence, label)[source]
Bases:
object
Single object detection result.
- Parameters:
bbox (BoundingBox) – Bounding box in normalized coordinates.
confidence (float) – Detection confidence score (0.0 to 1.0).
label (str) – Detected object class or description.
-
bbox:
BoundingBox
- __init__(bbox, confidence, label)
- class detection_loader.DetectionResult(detections, image_width, image_height, processing_time)[source]
Bases:
object
Detection results for a single image.
- Parameters:
- __init__(detections, image_width, image_height, processing_time)
- class detection_loader.DetectionModelLoader(config)[source]
Bases:
ABC
Abstract base class for object detection model loaders.
All detection loaders must implement the load and detect methods.
- __init__(config)[source]
Initialize the detection model loader with configuration.
- Parameters:
config (DetectionConfig) – Configuration for model loading and inference.
- abstractmethod load()[source]
Load the detection model into memory with configured settings.
- Raises:
RuntimeError – If model loading fails.
- Return type:
- abstractmethod detect(image, text_prompt)[source]
Detect objects in an image based on text prompt.
- Parameters:
image (Image.Image) – PIL Image to process.
text_prompt (str) – Text description of objects to detect (e.g., “person. car. dog.”).
- Returns:
Detection results with bounding boxes in normalized coordinates.
- Return type:
- Raises:
RuntimeError – If detection fails or model is not loaded.
- class detection_loader.YOLOWorldLoader(config)[source]
Bases:
DetectionModelLoader
Loader for YOLO-World v2.1 open-vocabulary detection model.
YOLO-World v2.1 achieves real-time performance (52 FPS) with strong accuracy on open-vocabulary object detection tasks.
- class detection_loader.GroundingDINOLoader(config)[source]
Bases:
DetectionModelLoader
Loader for Grounding DINO 1.5 open-vocabulary detection model.
Grounding DINO 1.5 achieves 52.5 AP on COCO with zero-shot open-world object detection capabilities.
- class detection_loader.OWLv2Loader(config)[source]
Bases:
DetectionModelLoader
Loader for OWLv2 open-vocabulary detection model.
OWLv2 uses scaled training data and achieves strong performance on rare and novel object classes.
- class detection_loader.Florence2Loader(config)[source]
Bases:
DetectionModelLoader
Loader for Florence-2 unified vision model.
Florence-2 is a 230M parameter model that supports multiple vision tasks including object detection, captioning, and grounding.
- detection_loader.create_detection_loader(model_name, config)[source]
Factory function to create appropriate detection loader based on model name.
- Parameters:
model_name (str) – Name of the model to load. Supported values: - “yolo-world-v2” or “yoloworld” - “grounding-dino-1-5” or “groundingdino” - “owlv2” or “owl-v2” - “florence-2” or “florence2”
config (DetectionConfig) – Configuration for model loading and inference.
- Returns:
Appropriate loader instance for the specified model.
- Return type:
- Raises:
ValueError – If model_name is not recognized.
Object Tracking
Video segmentation and tracking with multiple model architectures.
This module provides a unified interface for loading and running inference with various video segmentation and tracking models including SAMURAI, SAM2Long, SAM2.1, and YOLO11n-seg. Models support temporal consistency across frames, occlusion handling, and mask-based segmentation output.
- class tracking_loader.TrackingFramework(*values)[source]
-
Supported tracking frameworks for model execution.
- PYTORCH = 'pytorch'
- ULTRALYTICS = 'ultralytics'
- SAM2 = 'sam2'
- class tracking_loader.TrackingConfig(model_id, framework=TrackingFramework.PYTORCH, device='cuda', cache_dir=None, checkpoint_path=None)[source]
Bases:
object
Configuration for video tracking model loading and inference.
- Parameters:
model_id (str) – HuggingFace model identifier or model name.
framework (TrackingFramework) – Framework to use for model execution.
device (str, default="cuda") – Device to load the model on.
cache_dir (Path | None, default=None) – Directory for caching model weights.
checkpoint_path (Path | None, default=None) – Path to model checkpoint file if using local weights.
-
framework:
TrackingFramework
= 'pytorch'
- __init__(model_id, framework=TrackingFramework.PYTORCH, device='cuda', cache_dir=None, checkpoint_path=None)
- class tracking_loader.TrackingMask(mask, confidence, object_id)[source]
Bases:
object
Segmentation mask for a tracked object.
- Parameters:
- __init__(mask, confidence, object_id)
- class tracking_loader.TrackingFrame(frame_idx, masks, occlusions, processing_time)[source]
Bases:
object
Tracking results for a single video frame.
- Parameters:
frame_idx (int) – Zero-indexed frame number in the video sequence.
masks (list[TrackingMask]) – List of segmentation masks for tracked objects in this frame.
occlusions (dict[int, bool]) – Mapping of object_id to occlusion status (True if occluded).
processing_time (float) – Processing time for this frame in seconds.
-
masks:
list
[TrackingMask
]
- __init__(frame_idx, masks, occlusions, processing_time)
- class tracking_loader.TrackingResult(frames, video_width, video_height, total_processing_time, fps)[source]
Bases:
object
Tracking results for a video sequence.
- Parameters:
frames (list[TrackingFrame]) – Tracking results for each frame in the sequence.
video_width (int) – Video frame width in pixels.
video_height (int) – Video frame height in pixels.
total_processing_time (float) – Total processing time for all frames in seconds.
fps (float) – Processing speed in frames per second.
-
frames:
list
[TrackingFrame
]
- __init__(frames, video_width, video_height, total_processing_time, fps)
- class tracking_loader.TrackingModelLoader(config)[source]
Bases:
ABC
Abstract base class for video tracking model loaders.
All tracking loaders must implement the load and track methods.
- __init__(config)[source]
Initialize the tracking model loader with configuration.
- Parameters:
config (TrackingConfig) – Configuration for model loading and inference.
- abstractmethod load()[source]
Load the tracking model into memory with configured settings.
- Raises:
RuntimeError – If model loading fails.
- Return type:
- abstractmethod track(frames, initial_masks, object_ids)[source]
Track objects across video frames with mask-based segmentation.
- Parameters:
frames (list[Image.Image]) – List of PIL Images representing consecutive video frames.
initial_masks (list[np.ndarray]) – Initial segmentation masks for objects in the first frame. Each mask is a binary numpy array with shape (H, W).
object_ids (list[int]) – Unique identifiers for each object to track.
- Returns:
Tracking results with segmentation masks for each frame.
- Return type:
- Raises:
RuntimeError – If tracking fails or model is not loaded.
ValueError – If number of initial_masks does not match object_ids length.
- class tracking_loader.SAMURAILoader(config)[source]
Bases:
TrackingModelLoader
Loader for SAMURAI motion-aware tracking model.
SAMURAI achieves 7.1% better performance than SAM2 baseline with motion-aware tracking and occlusion handling capabilities.
- class tracking_loader.SAM2LongLoader(config)[source]
Bases:
TrackingModelLoader
Loader for SAM2Long long video tracking model.
SAM2Long achieves 5.3% better performance than SAM2 baseline with error accumulation fixes for long video sequences.
- class tracking_loader.SAM2Loader(config)[source]
Bases:
TrackingModelLoader
Loader for SAM2.1 baseline video segmentation model.
SAM2.1 provides baseline performance with proven stability for general video segmentation and tracking tasks.
- class tracking_loader.YOLO11SegLoader(config)[source]
Bases:
TrackingModelLoader
Loader for YOLO11n-seg lightweight segmentation model.
YOLO11n-seg is a 2.7M parameter model optimized for real-time segmentation in speed-critical applications.
- tracking_loader.create_tracking_loader(model_name, config)[source]
Factory function to create appropriate tracking loader based on model name.
- Parameters:
model_name (str) – Name of the model to load. Supported values: - “samurai” (default) - “sam2long” or “sam2-long” - “sam2” or “sam2.1” - “yolo11n-seg” or “yolo11seg”
config (TrackingConfig) – Configuration for model loading and inference.
- Returns:
Appropriate loader instance for the specified model.
- Return type:
- Raises:
ValueError – If model_name is not recognized.
Video Utilities
Video processing utilities for frame extraction and audio processing.
This module provides functions for extracting frames from videos using OpenCV and extracting audio using FFmpeg. It supports various sampling strategies and output formats.
- exception video_utils.VideoProcessingError[source]
Bases:
Exception
Raised when video processing operations fail.
- class video_utils.VideoInfo(path, frame_count, fps, duration, width, height)[source]
Bases:
object
Container for video metadata.
- video_utils.get_video_info(video_path)[source]
Extract metadata from a video file.
- Parameters:
video_path (str) – Path to the video file.
- Returns:
Video metadata object.
- Return type:
- Raises:
VideoProcessingError – If the video cannot be opened or read.
- video_utils.extract_frame(video_path, frame_number)[source]
Extract a single frame from a video.
- Parameters:
- Returns:
Frame as numpy array in RGB format.
- Return type:
np.ndarray
- Raises:
VideoProcessingError – If frame extraction fails.
- video_utils.extract_frames_uniform(video_path, num_frames=10, max_dimension=None)[source]
Extract frames uniformly sampled from a video.
- Parameters:
- Returns:
List of tuples containing (frame_number, frame_array).
- Return type:
- Raises:
VideoProcessingError – If frame extraction fails.
- video_utils.extract_frames_by_rate(video_path, sample_rate=30, max_dimension=None)[source]
Extract frames at a specified sampling rate.
- Parameters:
- Returns:
List of tuples containing (frame_number, frame_array).
- Return type:
- Raises:
VideoProcessingError – If frame extraction fails.
- video_utils.resize_frame(frame, max_dimension)[source]
Resize a frame maintaining aspect ratio.
- Parameters:
frame (np.ndarray) – Input frame as numpy array.
max_dimension (int) – Maximum width or height in pixels.
- Returns:
Resized frame.
- Return type:
np.ndarray
- async video_utils.extract_audio(video_path, output_path=None, sample_rate=16000, channels=1)[source]
Extract audio from a video file using FFmpeg.
- Parameters:
- Returns:
Path to the extracted audio file.
- Return type:
- Raises:
VideoProcessingError – If audio extraction fails.
Model Management
Model management with dynamic loading, memory budget validation, and LRU eviction.
This module provides a ModelManager class that handles loading and unloading of AI models based on available GPU memory. Models are loaded on demand and automatically evicted when memory pressure occurs.
- class model_manager.ModelConfig(config_dict)[source]
Bases:
object
Configuration for a single model variant.
- class model_manager.TaskConfig(task_name, config_dict)[source]
Bases:
object
Configuration for a task type with multiple model options.
- options
Available model options for this task.
- Type:
- class model_manager.InferenceConfig(config_dict)[source]
Bases:
object
Global inference configuration settings.
- class model_manager.ModelManager(config_path)[source]
Bases:
object
Manages loading, unloading, and memory management of AI models.
This class handles dynamic model loading based on memory availability, implements LRU eviction when memory pressure occurs, and provides utilities for VRAM monitoring.
- config_path
Path to models.yaml configuration file.
- Type:
Path
- tasks
Task configurations.
- Type:
- inference_config
Global inference settings.
- Type:
- __init__(config_path)[source]
Initialize ModelManager with configuration file.
- Parameters:
config_path (str) – Path to models.yaml configuration file.
- get_available_vram()[source]
Get available GPU memory in bytes.
- Return type:
- Returns:
Available VRAM in bytes
- get_memory_usage_percentage()[source]
Get current GPU memory usage as percentage.
- Return type:
- Returns:
Memory usage percentage (0.0 to 1.0)
- check_memory_available(required_bytes)[source]
Check if sufficient memory is available for model loading.
- async load_model(task_type)[source]
Load a model for the specified task type.
This method loads the selected model for the task, handling memory management and eviction if necessary.
- Parameters:
task_type (
str
) – Task type to load model for- Return type:
- Returns:
Loaded model object
- Raises:
ValueError – If task type is invalid or model cannot be loaded
RuntimeError – If insufficient memory after eviction attempts
- get_model_config(task_type)[source]
Get configuration for a task type.
- Parameters:
task_type (
str
) – Task type to get configuration for- Return type:
- Returns:
Task configuration, or None if task type is invalid
- async set_selected_model(task_type, model_name)[source]
Change the selected model for a task type.
If the task’s model is currently loaded, it will be unloaded and the new model will be loaded.
- Parameters:
- Raises:
ValueError – If task type or model name is invalid
- Return type: