Skip to main content

Metrics Reference

Complete reference for all metrics exported by FOVEA services. All custom metrics use the fovea_ prefix.

API Metrics

fovea_api_requests_total

Type: Counter

Description: Total number of API requests

Labels:

  • method - HTTP method (GET, POST, PUT, DELETE)
  • route - API endpoint path
  • status - HTTP status code (200, 404, 500, etc.)

Example:

fovea_api_requests_total{method="GET",route="/api/videos",status="200"}

Use cases:

  • Calculate request rate: rate(fovea_api_requests_total[5m])
  • Monitor error rate: rate(fovea_api_requests_total{status=~"5.."}[5m])
  • Track endpoint usage by method

fovea_api_request_duration_milliseconds

Type: Histogram

Description: API request duration in milliseconds

Labels:

  • method - HTTP method
  • route - API endpoint path
  • status - HTTP status code

Buckets: 0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000, +Inf

Exported values:

  • fovea_api_request_duration_milliseconds_sum - Total duration
  • fovea_api_request_duration_milliseconds_count - Total requests
  • fovea_api_request_duration_milliseconds_bucket - Histogram buckets

Example:

fovea_api_request_duration_milliseconds_sum{method="GET",route="/api/videos",status="200"}

Use cases:

  • Calculate p95 latency: histogram_quantile(0.95, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
  • Average duration: rate(fovea_api_request_duration_milliseconds_sum[5m]) / rate(fovea_api_request_duration_milliseconds_count[5m])
  • Identify slow endpoints

Queue Metrics

fovea_queue_job_submitted

Type: Counter

Description: Number of jobs submitted to queues

Labels:

  • queue - Queue name (video-summarization, ontology-augmentation, etc.)
  • status - Job status (completed, failed)

Example:

fovea_queue_job_submitted{queue="video-summarization",status="completed"}

Use cases:

  • Job completion rate: rate(fovea_queue_job_submitted{status="completed"}[5m])
  • Job failure rate: rate(fovea_queue_job_submitted{status="failed"}[5m])
  • Queue throughput monitoring

fovea_queue_job_duration

Type: Histogram

Description: Job processing duration in milliseconds

Labels:

  • queue - Queue name
  • status - Job status

Example:

fovea_queue_job_duration{queue="video-summarization",status="completed"}

Use cases:

  • Average job duration: rate(fovea_queue_job_duration_sum[5m]) / rate(fovea_queue_job_duration_count[5m])
  • Job processing time trends
  • Identify slow job types

Model Service Metrics

fovea_model_service_requests

Type: Counter

Description: Number of requests to model service

Labels:

  • endpoint - API endpoint (/api/summarize, /api/detect, etc.)
  • status - HTTP status code

Example:

fovea_model_service_requests{endpoint="/api/summarize",status="200"}

Use cases:

  • Request rate by endpoint: rate(fovea_model_service_requests[5m])
  • Error rate: rate(fovea_model_service_requests{status=~"5.."}[5m])
  • Endpoint usage patterns

fovea_model_service_duration

Type: Histogram

Description: Model service response time in milliseconds

Labels:

  • endpoint - API endpoint
  • status - HTTP status code

Example:

fovea_model_service_duration{endpoint="/api/summarize",status="200"}

Use cases:

  • p95 inference latency: histogram_quantile(0.95, rate(fovea_model_service_duration_bucket[5m]))
  • Average response time: rate(fovea_model_service_duration_sum[5m]) / rate(fovea_model_service_duration_count[5m])
  • Endpoint performance comparison

Auto-Instrumented Metrics

These metrics are automatically generated by OpenTelemetry instrumentation.

fovea_http_server_duration_milliseconds

Type: Histogram

Description: Inbound HTTP request duration

Labels:

  • http_method - HTTP method
  • http_route - Route pattern
  • http_status_code - Status code

Use cases:

  • Server-side request latency
  • HTTP performance monitoring
  • Route-specific timing

fovea_http_client_duration_milliseconds

Type: Histogram

Description: Outbound HTTP request duration

Labels:

  • http_method - HTTP method
  • http_status_code - Status code
  • net_peer_name - Destination host
  • net_peer_port - Destination port

Use cases:

  • External dependency latency
  • Identify slow external services
  • Network performance monitoring

fovea_db_query_count

Type: Counter

Description: Database query counter

Labels:

  • operation - Query operation (SELECT, INSERT, UPDATE, DELETE)
  • table - Database table name

Note: Available when Prisma instrumentation is enabled.

Use cases:

  • Query frequency by table
  • Operation type distribution
  • Database access patterns

fovea_db_query_duration

Type: Histogram

Description: Database query execution time

Note: Available when Prisma instrumentation is enabled.

Use cases:

  • Slow query detection
  • Database performance monitoring
  • Query optimization

Example PromQL Queries

API Performance

Total request rate:

sum(rate(fovea_api_requests_total[5m]))

Error rate percentage:

100 * sum(rate(fovea_api_requests_total{status=~"5.."}[5m])) / sum(rate(fovea_api_requests_total[5m]))

P50, P95, P99 latency:

histogram_quantile(0.50, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
histogram_quantile(0.95, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
histogram_quantile(0.99, rate(fovea_api_request_duration_milliseconds_bucket[5m]))

Requests per endpoint:

sum by (route) (rate(fovea_api_requests_total[5m]))

Queue Health

Job completion rate:

rate(fovea_queue_job_submitted{status="completed"}[5m])

Job failure rate:

rate(fovea_queue_job_submitted{status="failed"}[5m])

Average job duration by queue:

sum by (queue) (rate(fovea_queue_job_duration_sum[5m])) / sum by (queue) (rate(fovea_queue_job_duration_count[5m]))

Job failure percentage:

100 * sum(rate(fovea_queue_job_submitted{status="failed"}[5m])) / sum(rate(fovea_queue_job_submitted[5m]))

Model Service

Inference request rate:

sum(rate(fovea_model_service_requests[5m]))

Average inference time:

rate(fovea_model_service_duration_sum[5m]) / rate(fovea_model_service_duration_count[5m])

P95 inference latency by endpoint:

histogram_quantile(0.95, sum by (endpoint, le) (rate(fovea_model_service_duration_bucket[5m])))

Recording Rules

Recording rules precompute frequently used queries for better performance.

Example recording rules (prometheus.yml):

groups:
- name: fovea_recording_rules
interval: 10s
rules:
- record: job:fovea_api_request_rate:5m
expr: sum(rate(fovea_api_requests_total[5m]))

- record: job:fovea_api_error_rate:5m
expr: sum(rate(fovea_api_requests_total{status=~"5.."}[5m]))

- record: job:fovea_api_latency_p95:5m
expr: histogram_quantile(0.95, rate(fovea_api_request_duration_milliseconds_bucket[5m]))

- record: job:fovea_queue_completion_rate:5m
expr: sum by (queue) (rate(fovea_queue_job_submitted{status="completed"}[5m]))

Use recording rules in queries:

job:fovea_api_request_rate:5m
job:fovea_api_error_rate:5m

Metric Retention

Default Prometheus retention: 15 days

To adjust retention, edit prometheus.yml:

prometheus:
command:
- '--storage.tsdb.retention.time=30d'

Troubleshooting Metrics

Metrics Not Appearing

Check:

  1. Service is exporting: docker compose logs backend | grep -i otel
  2. OTEL Collector running: docker compose ps otel-collector
  3. Metrics endpoint: curl http://localhost:8889/metrics | grep fovea
  4. Prometheus targets: http://localhost:9090/targets

Metrics Not Updating

Check:

  1. Export interval: Metrics export every 60 seconds
  2. Generate traffic: Make API requests
  3. Scrape interval: Prometheus scrapes every 15 seconds
  4. Time range: Ensure Grafana time range includes recent data

High Cardinality Warnings

If you see warnings about high cardinality:

  1. Limit label values
  2. Use recording rules for aggregated queries
  3. Increase Prometheus memory limits
  4. Consider sampling for high-volume metrics

Next Steps