tracking_loader
Video segmentation and tracking with multiple model architectures.
This module provides a unified interface for loading and running inference with various video segmentation and tracking models including SAMURAI, SAM2Long, SAM2.1, and YOLO11n-seg. Models support temporal consistency across frames, occlusion handling, and mask-based segmentation output.
logging
ABC
abstractmethod
dataclass
Enum
Path
Any
np
torch
Image
logger
OCCLUSION_CONFIDENCE_THRESHOLD
IOU_MATCH_THRESHOLD
LOW_CONFIDENCE_IOU_THRESHOLD
TrackingFramework Objects
class TrackingFramework(str, Enum)
Supported tracking frameworks for model execution.
PYTORCH
ULTRALYTICS
SAM2
TrackingConfig Objects
@dataclass
class TrackingConfig()
Configuration for video tracking model loading and inference.
Parameters
model_id : str HuggingFace model identifier or model name. framework : TrackingFramework Framework to use for model execution. device : str, default="cuda" Device to load the model on. cache_dir : Path | None, default=None Directory for caching model weights. checkpoint_path : Path | None, default=None Path to model checkpoint file if using local weights.
model_id
framework
device
cache_dir
checkpoint_path
TrackingMask Objects
@dataclass
class TrackingMask()
Segmentation mask for a tracked object.
Parameters
mask : np.ndarray Binary segmentation mask with shape (H, W) where values are 0 or 1. confidence : float Mask prediction confidence score (0.0 to 1.0). object_id : int Unique identifier for the tracked object across frames.
mask
confidence
object_id
to_rle
def to_rle() -> dict[str, Any]
Convert mask to Run-Length Encoding format.
Returns
dict[str, Any] RLE-encoded mask with 'size' and 'counts' keys.
TrackingFrame Objects
@dataclass
class TrackingFrame()
Tracking results for a single video frame.
Parameters
frame_idx : int Zero-indexed frame number in the video sequence. masks : list[TrackingMask] List of segmentation masks for tracked objects in this frame. occlusions : dict[int, bool] Mapping of object_id to occlusion status (True if occluded). processing_time : float Processing time for this frame in seconds.
frame_idx
masks
occlusions
processing_time
TrackingResult Objects
@dataclass
class TrackingResult()
Tracking results for a video sequence.
Parameters
frames : list[TrackingFrame] Tracking results for each frame in the sequence. video_width : int Video frame width in pixels. video_height : int Video frame height in pixels. total_processing_time : float Total processing time for all frames in seconds. fps : float Processing speed in frames per second.
frames
video_width
video_height
total_processing_time
fps
TrackingModelLoader Objects
class TrackingModelLoader(ABC)
Abstract base class for video tracking model loaders.
All tracking loaders must implement the load and track methods.
__init__
def __init__(config: TrackingConfig) -> None
Initialize the tracking model loader with configuration.
Parameters
config : TrackingConfig Configuration for model loading and inference.
load
@abstractmethod
def load() -> None
Load the tracking model into memory with configured settings.
Raises
RuntimeError If model loading fails.
track
@abstractmethod
def track(frames: list[Image.Image],
initial_masks: list[np.ndarray[Any, np.dtype[np.uint8]]],
object_ids: list[int]) -> TrackingResult
Track objects across video frames with mask-based segmentation.
Parameters
frames : list[Image.Image] List of PIL Images representing consecutive video frames. initial_masks : list[np.ndarray] Initial segmentation masks for objects in the first frame. Each mask is a binary numpy array with shape (H, W). object_ids : list[int] Unique identifiers for each object to track.
Returns
TrackingResult Tracking results with segmentation masks for each frame.
Raises
RuntimeError If tracking fails or model is not loaded. ValueError If number of initial_masks does not match object_ids length.
unload
def unload() -> None
Unload the model from memory to free GPU resources.
SAMURAILoader Objects
class SAMURAILoader(TrackingModelLoader)
Loader for SAMURAI motion-aware tracking model.
SAMURAI achieves 7.1% better performance than SAM2 baseline with motion-aware tracking and occlusion handling capabilities.
load
def load() -> None
Load SAMURAI model with configured settings.
track
def track(frames: list[Image.Image],
initial_masks: list[np.ndarray[Any, np.dtype[np.uint8]]],
object_ids: list[int]) -> TrackingResult
Track objects using SAMURAI with motion-aware tracking.
SAM2LongLoader Objects
class SAM2LongLoader(TrackingModelLoader)
Loader for SAM2Long long video tracking model.
SAM2Long achieves 5.3% better performance than SAM2 baseline with error accumulation fixes for long video sequences.
load
def load() -> None
Load SAM2Long model with configured settings.
track
def track(frames: list[Image.Image],
initial_masks: list[np.ndarray[Any, np.dtype[np.uint8]]],
object_ids: list[int]) -> TrackingResult
Track objects using SAM2Long with error accumulation fixes.
SAM2Loader Objects
class SAM2Loader(TrackingModelLoader)
Loader for SAM2.1 baseline video segmentation model.
SAM2.1 provides baseline performance with proven stability for general video segmentation and tracking tasks.
load
def load() -> None
Load SAM2.1 model with configured settings.
track
def track(frames: list[Image.Image],
initial_masks: list[np.ndarray[Any, np.dtype[np.uint8]]],
object_ids: list[int]) -> TrackingResult
Track objects using SAM2.1 baseline implementation.
YOLO11SegLoader Objects
class YOLO11SegLoader(TrackingModelLoader)
Loader for YOLO11n-seg lightweight segmentation model.
YOLO11n-seg is a 2.7M parameter model optimized for real-time segmentation in speed-critical applications.
load
def load() -> None
Load YOLO11n-seg model with configured settings.
track
def track(frames: list[Image.Image],
initial_masks: list[np.ndarray[Any, np.dtype[np.uint8]]],
object_ids: list[int]) -> TrackingResult
Track objects using YOLO11n-seg with per-frame segmentation.
Note: YOLO11n-seg performs independent segmentation per frame without temporal consistency. Object re-identification is based on spatial overlap.
create_tracking_loader
def create_tracking_loader(model_name: str,
config: TrackingConfig) -> TrackingModelLoader
Factory function to create appropriate tracking loader based on model name.
Parameters
model_name : str Name of the model to load. Supported values:
- "samurai" (default)
- "sam2long" or "sam2-long"
- "sam2" or "sam2.1"
- "yolo11n-seg" or "yolo11seg" config : TrackingConfig Configuration for model loading and inference.
Returns
TrackingModelLoader Appropriate loader instance for the specified model.
Raises
ValueError If model_name is not recognized.