Model loaders

Every loader in model-service/src/infrastructure/adapters/outbound/models/, grouped by task. Each entry lists the option id used in models.yaml, the framework, and the task slot it backs. The list is the union of models.yaml (GPU) and models-cpu.yaml (CPU); the framework column resolves the inference path.

Video summarization (VLM)

qwen-3-vl-235b-a22b           sglang         models.yaml
qwen-3-vl-30b-a3b             sglang         models.yaml
qwen-3-vl-30b-a3b-thinking    sglang         models.yaml
qwen-3-vl-8b                  sglang         models.yaml
qwen-3-vl-8b-thinking         sglang         models.yaml
qwen-2-5-vl-7b                sglang         models.yaml
qwen2-5-vl-72b                sglang         models.yaml
tarsier2-7b                   transformers   models.yaml
moondream-3                   transformers   models.yaml
moondream-2b                  transformers   models.yaml
moondream-0-5b                transformers   models.yaml
internvl3-78b                 sglang         models.yaml
llama-4-maverick              sglang         models.yaml
llama-4-scout                 vllm           models.yaml
pixtral-large                 sglang         models.yaml
gemma-3-27b                   sglang         models.yaml
smolvlm-2-2b                  transformers   models-cpu.yaml
smolvlm-500m                  transformers   models-cpu.yaml
qwen2-5-vl-3b-gguf            llama_cpp      models-cpu.yaml
gpt-4o                        external_api   models.yaml
gpt-5-4                       external_api   models.yaml
claude-sonnet-4-5             external_api   models.yaml
claude-sonnet-4-6             external_api   models.yaml
claude-opus-4-7               external_api   models.yaml
gemini-2-5-flash              external_api   models.yaml
gemini-3-1-pro                external_api   models.yaml
grok-4                        external_api   models.yaml

Ontology augmentation / claim extraction / claim synthesis (LLM)

qwen-3-8b                     sglang         models.yaml
qwen-3-32b                    sglang         models.yaml
qwen-3-1-7b-cpu               transformers   models-cpu.yaml
qwen-3-5-397b-a17b            sglang         models.yaml
qwen-2-5-7b                   sglang         models.yaml
qwen-2-5-32b                  sglang         models.yaml
qwen2-5-1-5b-cpu              transformers   models-cpu.yaml
qwen2-5-1-5b-gguf             llama_cpp      models-cpu.yaml
deepseek-r1-distill-qwen-14b  sglang         models.yaml
deepseek-r1-distill-qwen-32b  sglang         models.yaml
deepseek-r1-distill-qwen-1-5b-gguf  llama_cpp models-cpu.yaml
deepseek-v3                   sglang         models.yaml
deepseek-v3-2                 sglang         models.yaml
kimi-k2-6                     sglang         models.yaml
glm-4-7                       sglang         models.yaml
llama-3-3-70b                 sglang         models.yaml
gemma-3-27b-text              sglang         models.yaml
phi-3-mini-4k                 transformers   models.yaml
phi-4-mini                    transformers   models.yaml
claude-sonnet-4-5             external_api   models.yaml
claude-sonnet-4-6             external_api   models.yaml
claude-opus-4-7               external_api   models.yaml
gpt-5-4                       external_api   models.yaml
gemini-3-1-pro                external_api   models.yaml
grok-4                        external_api   models.yaml

Object detection

sam-3-1                       sam3           models.yaml
sam-3                         sam3           models.yaml
yolov12-large                 ultralytics    models.yaml
yoloe-26                      ultralytics    models.yaml
rf-detr-base                  pytorch        models.yaml
yolo-world-v2                 pytorch        models.yaml
yolo-world-s-onnx             onnx           models-cpu.yaml
grounding-dino-1-5            pytorch        models.yaml
grounding-dino-tiny-onnx      onnx           models-cpu.yaml
owlv2                         pytorch        models.yaml
florence-2                    transformers   models.yaml
florence-2-base-onnx          onnx           models-cpu.yaml

Object tracking

sam-3-1-tracking              sam3           models.yaml
sam2-1                        pytorch        models.yaml
sam2long                      pytorch        models.yaml
samurai                       pytorch        models.yaml
yolo11n-seg                   ultralytics    models.yaml

Audio transcription

whisper-v3-turbo              whisper            models.yaml
whisper-large-v3              whisper            models.yaml
faster-whisper-large-v3       faster_whisper     models.yaml
faster-whisper-medium-cpu     faster_whisper     models-cpu.yaml
faster-whisper-small-cpu      faster_whisper     models-cpu.yaml
canary-qwen-2-5b              nemo_canary        models.yaml
parakeet-tdt-1-1b             nemo_parakeet      models.yaml
whisperx-large-v3             whisperx           models.yaml
assemblyai-universal          external_api       models.yaml
deepgram-nova-3               external_api       models.yaml
gladia                        external_api       models.yaml
revai                         external_api       models.yaml
azure-speech                  external_api       models.yaml
google-speech                 external_api       models.yaml
aws-transcribe                external_api       models.yaml

Speaker diarization and VAD

pyannote-3-1                  pyannote       speaker_diarization
silero-vad                    pytorch        voice_activity_detection

Loader implementations

The following entries are first-class loader implementations (not just YAML rows):

SAM 3 / 3.1                   models/sam3/loader.py
                              models/sam3/detection_adapter.py
                              models/sam3/tracking_adapter.py
Canary-Qwen                   models/audio/canary.py
Parakeet TDT                  models/audio/parakeet.py
WhisperX                      models/audio/whisperx.py
YOLOv12 / YOLOE-26 / RF-DETR  models/detection/loader.py
ONNX detection (CPU)          models/onnx/yolo_world.py,
                              models/onnx/florence.py,
                              models/onnx/grounding_dino.py
llama.cpp LLM / VLM (CPU)     models/llama_cpp/llm.py,
                              models/llama_cpp/vlm.py
SmolVLM / Moondream (CPU)     models/vlm/loader.py (transformers branch)

Selection and load failures

ModelManager.load_model loads only the option named by the slot's selected field (resolved via TaskConfig.get_selected_config); it does not automatically iterate the other options entries when a loader raises. Any loader exception propagates to the caller; the transcribe and diarize routes surface load failures with a Model load failed detail, and the other per-task routes (detection, summarization, claims, ontology, thumbnails) wrap load failures with their own per-route detail strings, until the slot is repointed via POST /api/models/select or the underlying option is fixed. GET /api/models/status reports loaded_models plus memory and availability flags; it does not surface a failed-option record for slots that are not currently loaded.

The LLM loader layer exposes a separate helper, create_llm_loader_with_fallback (model-service/src/infrastructure/adapters/outbound/models/llm/loader.py), which iterates an explicit fallback_configs list inside a single loader. That helper operates on an in-process list of configurations, not on the YAML options dict.

Video summarization (VLM)​

Ontology augmentation / claim extraction / claim synthesis (LLM)​

Object detection​

Object tracking​

Audio transcription​

Speaker diarization and VAD​

Loader implementations​

Selection and load failures​