detection_loader
Open-vocabulary object detection with multiple model architectures.
This module provides a unified interface for loading and running inference with various open-vocabulary object detection models including YOLO-World v2.1, Grounding DINO 1.5, OWLv2, and Florence-2. Models support text-based prompts for detecting objects without pre-defined class vocabularies.
logging
ABC
abstractmethod
dataclass
Enum
Path
Any
np
torch
Image
logger
DetectionFramework Objects
class DetectionFramework(str, Enum)
Supported detection frameworks for model execution.
PYTORCH
ULTRALYTICS
TRANSFORMERS
DetectionConfig Objects
@dataclass
class DetectionConfig()
Configuration for object detection model loading and inference.
Parameters
model_id : str HuggingFace model identifier or Ultralytics model name. framework : DetectionFramework Framework to use for model execution. confidence_threshold : float, default=0.25 Minimum confidence score for detections (0.0 to 1.0). device : str, default="cuda" Device to load the model on. cache_dir : Path | None, default=None Directory for caching model weights.
model_id
framework
confidence_threshold
device
cache_dir
BoundingBox Objects
@dataclass
class BoundingBox()
Bounding box in normalized coordinates.
Parameters
x1 : float Left coordinate (0.0 to 1.0, normalized by image width). y1 : float Top coordinate (0.0 to 1.0, normalized by image height). x2 : float Right coordinate (0.0 to 1.0, normalized by image width). y2 : float Bottom coordinate (0.0 to 1.0, normalized by image height).
x1
y1
x2
y2
to_absolute
def to_absolute(width: int, height: int) -> tuple[int, int, int, int]
Convert normalized coordinates to absolute pixel coordinates.
Parameters
width : int Image width in pixels. height : int Image height in pixels.
Returns
tuple[int, int, int, int] Bounding box in absolute coordinates (x1, y1, x2, y2).
Detection Objects
@dataclass
class Detection()
Single object detection result.
Parameters
bbox : BoundingBox Bounding box in normalized coordinates. confidence : float Detection confidence score (0.0 to 1.0). label : str Detected object class or description.
bbox
confidence
label
DetectionResult Objects
@dataclass
class DetectionResult()
Detection results for a single image.
Parameters
detections : list[Detection] List of detected objects with bounding boxes and scores. image_width : int Original image width in pixels. image_height : int Original image height in pixels. processing_time : float Processing time in seconds.
detections
image_width
image_height
processing_time
DetectionModelLoader Objects
class DetectionModelLoader(ABC)
Abstract base class for object detection model loaders.
All detection loaders must implement the load and detect methods.
__init__
def __init__(config: DetectionConfig) -> None
Initialize the detection model loader with configuration.
Parameters
config : DetectionConfig Configuration for model loading and inference.
load
@abstractmethod
def load() -> None
Load the detection model into memory with configured settings.
Raises
RuntimeError If model loading fails.
detect
@abstractmethod
def detect(image: Image.Image, text_prompt: str) -> DetectionResult
Detect objects in an image based on text prompt.
Parameters
image : Image.Image PIL Image to process. text_prompt : str Text description of objects to detect (e.g., "person. car. dog.").
Returns
DetectionResult Detection results with bounding boxes in normalized coordinates.
Raises
RuntimeError If detection fails or model is not loaded.
unload
def unload() -> None
Unload the model from memory to free GPU resources.
YOLOWorldLoader Objects
class YOLOWorldLoader(DetectionModelLoader)
Loader for YOLO-World v2.1 open-vocabulary detection model.
YOLO-World v2.1 achieves real-time performance (52 FPS) with strong accuracy on open-vocabulary object detection tasks.
load
def load() -> None
Load YOLO-World v2.1 model with configured settings.
detect
def detect(image: Image.Image, text_prompt: str) -> DetectionResult
Detect objects using YOLO-World v2.1 with text prompts.
GroundingDINOLoader Objects
class GroundingDINOLoader(DetectionModelLoader)
Loader for Grounding DINO 1.5 open-vocabulary detection model.
Grounding DINO 1.5 achieves 52.5 AP on COCO with zero-shot open-world object detection capabilities.
load
def load() -> None
Load Grounding DINO 1.5 model with configured settings.
detect
def detect(image: Image.Image, text_prompt: str) -> DetectionResult
Detect objects using Grounding DINO 1.5 with text prompts.
OWLv2Loader Objects
class OWLv2Loader(DetectionModelLoader)
Loader for OWLv2 open-vocabulary detection model.
OWLv2 uses scaled training data and achieves strong performance on rare and novel object classes.
load
def load() -> None
Load OWLv2 model with configured settings.
detect
def detect(image: Image.Image, text_prompt: str) -> DetectionResult
Detect objects using OWLv2 with text prompts.
Florence2Loader Objects
class Florence2Loader(DetectionModelLoader)
Loader for Florence-2 unified vision model.
Florence-2 is a 230M parameter model that supports multiple vision tasks including object detection, captioning, and grounding.
load
def load() -> None
Load Florence-2 model with configured settings.
detect
def detect(image: Image.Image, text_prompt: str) -> DetectionResult
Detect objects using Florence-2 with text prompts.
create_detection_loader
def create_detection_loader(model_name: str,
config: DetectionConfig) -> DetectionModelLoader
Factory function to create appropriate detection loader based on model name.
Parameters
model_name : str Name of the model to load. Supported values:
- "yolo-world-v2" or "yoloworld"
- "grounding-dino-1-5" or "groundingdino"
- "owlv2" or "owl-v2"
- "florence-2" or "florence2" config : DetectionConfig Configuration for model loading and inference.
Returns
DetectionModelLoader Appropriate loader instance for the specified model.
Raises
ValueError If model_name is not recognized.