Skip to main content

Observability

Use the OpenTelemetry collector, Prometheus, and Grafana dashboards bundled in docker-compose.yml to inspect the running stack.

OpenTelemetry collector

docker-compose.yml runs otel/opentelemetry-collector-contrib configured by otel-collector-config.yaml at the repo root. The collector accepts:

4317   OTLP gRPC
4318 OTLP HTTP
8889 Prometheus metrics scrape endpoint (collector self-metrics)

The backend and the model service both ship traces and metrics to the collector via OTEL_EXPORTER_OTLP_ENDPOINT.

Model service spans and metrics

Use cases in model-service/src/application/use_cases/ wrap their work in OpenTelemetry spans. Span names follow per-method snake_case conventions; both <name>_use_case (for example detect_objects_use_case, track_objects_use_case) and use_case.<name> (for example use_case.extract_claims, use_case.synthesize_summary) are in use. Some use cases emit multiple variant spans rather than a single span per class; for example use_case.augment_ontology.local and .external, use_case.fuse_modalities.sequential / .timestamp_aligned / .native_multimodal / .hybrid, and summarize_video_with_vlm alongside summarize_video_external_api. Span attributes carry the request DTO's identifying fields (video id, persona id, model id where applicable).

Outbound adapters in model-service/src/infrastructure/adapters/outbound/ record inference metrics through the record_inference helper in infrastructure/observability/telemetry.py:

counter       model.inference.count    (monotonic; "Number of model inference calls")
histogram model.inference.duration (unit s; model inference duration in seconds)
labels task, model
counter-only result (success | error)

The counter carries task, model, and result; the duration histogram carries task and model only. The metrics surface in Prometheus through the OTel collector's :8889 exporter. The bundled Grafana dashboards (error and RBAC) do not visualize these metrics yet; query them directly through Prometheus.

Prometheus

Prometheus runs at :9090 with prometheus.yml from the repo root. Scrape targets include the OTel collector's metrics exporter on :8889. Alert rules are defined in prometheus-alerts.yml.

Grafana

Grafana runs at :3002. Dashboards live in grafana-dashboards/ in the repo root.

Frontend telemetry

The frontend posts batched traces to POST /api/telemetry/traces, which the backend forwards to the collector. This is the only way client-side spans reach the trace pipeline; direct OTLP from the browser is not used.

Logs

Each service uses Pino-style structured JSON for the backend, Python logging for the model service, and console for the frontend. There is no centralized log aggregator in the default stack; pipe docker compose logs -f <service> or attach a sidecar.