Clean Architecture

v0.3.0 restructured the model service from a flat model-service/src/<module>.py layout into three concentric layers: domain, application, and infrastructure. The motivation is the same as the canonical Clean Architecture argument: business rules should not depend on the framework, the framework should not import the business rules transitively, and every external dependency should sit behind an interface the inner layer owns. This page describes how that lands in the v0.3.x model service.

Layers

src/
  domain/                  pure data, no I/O, no torch
    entities/              ModelConfig, TaskConfig, InferenceConfig,
                           Detection, Summary, Video, Ontology
    value_objects/         BoundingBox, ConfidenceScore, Timestamp
    exceptions.py          ModelError, InferenceError, ConfigError, ...
  application/             use cases and ports, no torch, no FastAPI
    dto/                   ReasonedText, ThinkingTrace, GenerationConfig,
                           DetectionRequest, SummarizationRequest, ...
    ports/
      inbound/             interfaces called by adapters/inbound
      outbound/            interfaces called by use cases (ILanguageModel,
                           IVisionLanguageModel, IDetectionModel,
                           ITrackingModel, IAudioTranscriber,
                           IModelRepository, IFrameSampler, ...)
    services/              ModelManager, AudioProcessingService
    use_cases/             SummarizeVideoUseCase, ExtractClaimsUseCase,
                           SynthesizeSummaryUseCase, AugmentOntologyUseCase,
                           DetectObjectsUseCase, TrackObjectsUseCase,
                           FuseModalitiesUseCase
  infrastructure/          everything that touches torch, the network,
                           the filesystem, or FastAPI
    adapters/
      inbound/fastapi/     routes, schemas, mappers
      outbound/            LLMLoaderAdapter, VLMLoaderAdapter,
                           DetectionAdapter, TrackingAdapter,
                           WhisperTranscriberAdapter, vendor audio
                           clients, frame samplers, persistence
    config/                Container, ContainerConfig, task_factories
    observability/         telemetry helpers

The dependency rule is one-way: domain imports nothing from application or infrastructure; application imports from domain only; infrastructure imports from application and domain to wire concrete adapters. The reverse never happens.

What each layer adds

The domain layer holds the entities, value objects, and the exception hierarchy. Nothing in the domain knows that torch, FastAPI, or YAML exist.
The application layer holds the use cases and the port interfaces (ABCs and Protocols). A use case takes ports in its constructor and DTOs at its method boundary; it does not import torch and does not know which framework is serving the HTTP request.
The infrastructure layer holds the adapters. Each adapter implements one outbound port against one concrete library (e.g. LLMLoaderAdapter implements ILanguageModel against the SGLang / vLLM / Transformers / llama.cpp loader fan-out). The FastAPI routes live here too; they are inbound adapters that translate HTTP into use-case calls.

Ports versus adapters

A port is an abstract interface in application/ports/. An adapter is a concrete implementation in infrastructure/adapters/. The use case names the port; the container injects the adapter.

# application/ports/outbound/llm.py
class ILanguageModel(ABC):
    @abstractmethod
    async def generate(
        self, prompt: str, max_tokens: int = 512,
        temperature: float = 0.7, **kwargs: Any,
    ) -> str: ...

    @abstractmethod
    async def generate_reasoned(
        self, prompt: str, *, max_tokens: int = 512,
        temperature: float = 0.7, **kwargs: Any,
    ) -> ReasonedText: ...

# infrastructure/adapters/outbound/llm_adapter.py
class LLMLoaderAdapter(ILanguageModel):
    def __init__(self, loader: _LLMLoaderLike) -> None:
        self._loader = loader
    async def generate(self, prompt, ...): ...
    async def generate_reasoned(self, prompt, ...): -> ReasonedText

_LLMLoaderLike and _LoaderConfig are structural Protocols that describe the shape of the underlying loader without naming a concrete class; this is what lets the same adapter sit in front of SGLang, vLLM, Transformers, and llama.cpp loaders without a union type.

The same port-adapter pair appears for every external dependency:

ILanguageModel              LLMLoaderAdapter
IVisionLanguageModel        VLMLoaderAdapter
IDetectionModel             DetectionAdapter, SAM3DetectionAdapter,
                            ONNX detection adapters (YOLO-World, Florence-2,
                            Grounding DINO)
ITrackingModel              TrackingAdapter, SAM3TrackingAdapter
IAudioTranscriber           WhisperTranscriberAdapter (modern path),
                            seven vendor audio clients (assemblyai,
                            aws_transcribe, azure_speech, deepgram,
                            gladia, google_speech, revai)
ISpeakerDiarizer            PyannoteDiarizerAdapter
IVoiceActivityDetector      SileroVADAdapter
IFrameSampler               OpenCVFrameSampler
IModelRepository            YamlModelRepository
IModelCapabilityProbe       TorchModelCapabilityProbe
ITranscriber                WhisperTranscriberAdapter (legacy path)
IExternalAPIRouter          ExternalAPIRouterAdapter

YamlModelRepository is the canonical example. The port declares get_all_tasks, get_task, get_model, get_inference_config, and set_selected_model; the adapter loads models.yaml (or models-cpu.yaml), translates each YAML task block into a typed TaskConfig, and exposes them as domain entities. Nothing in application/ or domain/ imports yaml or pathlib.

Dependency injection

infrastructure/config/container.py wires the adapters to the ports. Use-case factory methods construct a fresh use-case instance on each call from cached adapter factories.

@dataclass
class ContainerConfig:
    model_config_path: Path
    enable_telemetry: bool = True
    enable_warmup: bool = False

@dataclass
class Container:
    config: ContainerConfig

    @property
    def model_manager(self) -> ModelManager: ...

    def language_model(
        self, *, model_id: str = "meta-llama/Llama-3.2-3B-Instruct",
    ) -> ILanguageModel: ...

    def audio_transcriber(
        self, *, model_id: str = "openai/whisper-large-v3-turbo",
        framework: str = "whisper", language: str | None = None,
    ) -> IAudioTranscriber: ...

    def detect_objects_use_case(self) -> DetectObjectsUseCase: ...
    def summarize_video_use_case(self) -> SummarizeVideoUseCase: ...
    def extract_claims_use_case(self) -> ExtractClaimsUseCase: ...

ContainerConfig is the single seam for the model config path; MODEL_CONFIG_PATH from the environment becomes config.model_config_path. The Dockerfile creates a build-time symlink (/app/config/active-models.yaml -> models.yaml or models-cpu.yaml depending on the DEVICE build arg) so the same path works on both CPU and GPU images. See Reference > Model config.

ModelManager.__init__ requires capability_probe since v0.3.0; the lazy default that v0.2.x carried was removed because it hid configuration mistakes until first inference.

Observability

Every use case wraps its execute method in an OpenTelemetry span. Every outbound adapter emits a model_inference metric on every call, tagged with model_id, task, and framework. The metrics flow through the OTel collector to Prometheus; the spans flow to the trace pipeline. See Guide > Observability.

Why this shape

Use cases are unit-testable with typed fakes: 234 unit tests at the application boundary use only fakes against the port interfaces. The model-service test suite adds 158 more covering the adapters.
The frontend / backend / model-service contract is the only place infrastructure leaks out, and it is gated by FastAPI schemas in infrastructure/adapters/inbound/fastapi/schemas/.
Adding a new framework (a new VLM runtime, a new audio vendor, a new detector) is one new adapter file plus one new task-factory entry. The use cases do not change.

Runtime cyclic-import guards

The shared base modules audio/base.py, detection/base.py, and llm/base.py exist to break runtime cycles between adapters and their factory functions. Adapters depend on the base, the factory depends on the base, the adapter does not depend on the factory. This pattern repeats for each modality that has multiple loader backends.

Layers​

What each layer adds​

Ports versus adapters​

Dependency injection​

Observability​

Why this shape​

Runtime cyclic-import guards​