Model loaders
Every loader in
model-service/src/infrastructure/adapters/outbound/models/,
grouped by task. Each entry lists the option id used in
models.yaml, the framework, and the task slot it backs. The
list is the union of models.yaml (GPU) and models-cpu.yaml
(CPU); the framework column resolves the inference path.
Video summarization (VLM)
qwen-3-vl-235b-a22b sglang models.yaml
qwen-3-vl-30b-a3b sglang models.yaml
qwen-3-vl-30b-a3b-thinking sglang models.yaml
qwen-3-vl-8b sglang models.yaml
qwen-3-vl-8b-thinking sglang models.yaml
qwen-2-5-vl-7b sglang models.yaml
qwen2-5-vl-72b sglang models.yaml
tarsier2-7b transformers models.yaml
moondream-3 transformers models.yaml
moondream-2b transformers models.yaml
moondream-0-5b transformers models.yaml
internvl3-78b sglang models.yaml
llama-4-maverick vllm models.yaml
llama-4-scout vllm models.yaml
pixtral-large vllm models.yaml
gemma-3-27b transformers models.yaml
smolvlm-2-2b transformers models-cpu.yaml
smolvlm-500m transformers models-cpu.yaml
qwen2-5-vl-3b-gguf llama_cpp models-cpu.yaml
florence-2 transformers models.yaml
florence-2-base-onnx onnx models-cpu.yaml
gpt-4o external_api models.yaml
gpt-5-4 external_api models.yaml
claude-sonnet-4-5 external_api models.yaml
claude-sonnet-4-6 external_api models.yaml
claude-opus-4-7 external_api models.yaml
gemini-2-5-flash external_api models.yaml
gemini-3-1-pro external_api models.yaml
grok-4 external_api models.yaml
Ontology augmentation / claim extraction / claim synthesis (LLM)
qwen-3-8b sglang models.yaml
qwen-3-32b sglang models.yaml
qwen-3-1-7b-cpu transformers models-cpu.yaml
qwen-3-5-397b-a17b sglang models.yaml
qwen-2-5-7b sglang models.yaml
qwen-2-5-32b sglang models.yaml
qwen2-5-1-5b-cpu transformers models-cpu.yaml
qwen2-5-1-5b-gguf llama_cpp models-cpu.yaml
deepseek-r1-distill-qwen-14b sglang models.yaml
deepseek-r1-distill-qwen-32b sglang models.yaml
deepseek-r1-distill-qwen-1-5b-gguf llama_cpp models-cpu.yaml
deepseek-v3 sglang models.yaml
deepseek-v3-2 sglang models.yaml
kimi-k2-6 sglang models.yaml
glm-4-7 sglang models.yaml
llama-3-3-70b vllm models.yaml
gemma-3-27b-text transformers models.yaml
phi-3-mini-4k transformers models.yaml
phi-4-mini transformers models.yaml
claude-sonnet-4-5 external_api models.yaml
claude-sonnet-4-6 external_api models.yaml
claude-opus-4-7 external_api models.yaml
gpt-5-4 external_api models.yaml
gemini-3-1-pro external_api models.yaml
grok-4 external_api models.yaml
Object detection
sam-3-1 transformers models.yaml
sam-3 transformers models.yaml
yolov12-large transformers models.yaml
yoloe-26 transformers models.yaml
rf-detr-base transformers models.yaml
yolo-world-v2 transformers models.yaml
yolo-world-s-onnx onnx models-cpu.yaml
grounding-dino-1-5 transformers models.yaml
grounding-dino-tiny-onnx onnx models-cpu.yaml
owlv2 transformers models.yaml
Object tracking
sam-3-1-tracking transformers models.yaml
sam2-1 transformers models.yaml
sam2long transformers models.yaml
samurai transformers models.yaml
yolo11n-seg transformers models.yaml
Audio transcription
whisper-v3-turbo transformers models.yaml
whisper-large-v3 transformers models.yaml
faster-whisper-large-v3 faster_whisper models.yaml
faster-whisper-medium-cpu faster_whisper models-cpu.yaml
faster-whisper-small-cpu faster_whisper models-cpu.yaml
canary-qwen-2-5b transformers models.yaml
parakeet-tdt-1-1b transformers models.yaml
whisperx-large-v3 transformers models.yaml
assemblyai-universal external_api models.yaml
deepgram-nova-3 external_api models.yaml
gladia external_api models.yaml
revai external_api models.yaml
azure-speech external_api models.yaml
google-speech external_api models.yaml
aws-transcribe external_api models.yaml
Speaker diarization and VAD
pyannote-3-1 transformers speaker_diarization
silero-vad transformers voice_activity_detection
Wave 2+3 loaders introduced in v0.3.0
The following entries are new loader implementations (not just new YAML rows):
SAM 3 / 3.1 models/sam3/loader.py
models/sam3/detection_adapter.py
models/sam3/tracking_adapter.py
Canary-Qwen models/audio/canary.py
Parakeet TDT models/audio/parakeet.py
WhisperX models/audio/whisperx.py
YOLOv12 / YOLOE-26 / RF-DETR models/detection/loader.py
ONNX detection (CPU) models/onnx/yolo_world.py,
models/onnx/florence.py,
models/onnx/grounding_dino.py
llama.cpp LLM / VLM (CPU) models/llama_cpp/llm.py,
models/llama_cpp/vlm.py
SmolVLM / Moondream (CPU) models/vlm/loader.py (transformers branch)
Fallback chain
If the selected option fails to load, the manager iterates the
remaining options entries in declaration order. The first
working option becomes the active loader; the failed option is
recorded in
GET /api/models/status. A slot with no working option returns
503 from any route that needs it until the configuration is
repaired.