Metrics Reference
Complete reference for all metrics exported by FOVEA services. All custom metrics use the fovea_
prefix.
API Metrics
fovea_api_requests_total
Type: Counter
Description: Total number of API requests
Labels:
method
- HTTP method (GET, POST, PUT, DELETE)route
- API endpoint pathstatus
- HTTP status code (200, 404, 500, etc.)
Example:
fovea_api_requests_total{method="GET",route="/api/videos",status="200"}
Use cases:
- Calculate request rate:
rate(fovea_api_requests_total[5m])
- Monitor error rate:
rate(fovea_api_requests_total{status=~"5.."}[5m])
- Track endpoint usage by method
fovea_api_request_duration_milliseconds
Type: Histogram
Description: API request duration in milliseconds
Labels:
method
- HTTP methodroute
- API endpoint pathstatus
- HTTP status code
Buckets: 0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000, +Inf
Exported values:
fovea_api_request_duration_milliseconds_sum
- Total durationfovea_api_request_duration_milliseconds_count
- Total requestsfovea_api_request_duration_milliseconds_bucket
- Histogram buckets
Example:
fovea_api_request_duration_milliseconds_sum{method="GET",route="/api/videos",status="200"}
Use cases:
- Calculate p95 latency:
histogram_quantile(0.95, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
- Average duration:
rate(fovea_api_request_duration_milliseconds_sum[5m]) / rate(fovea_api_request_duration_milliseconds_count[5m])
- Identify slow endpoints
Queue Metrics
fovea_queue_job_submitted
Type: Counter
Description: Number of jobs submitted to queues
Labels:
queue
- Queue name (video-summarization, ontology-augmentation, etc.)status
- Job status (completed, failed)
Example:
fovea_queue_job_submitted{queue="video-summarization",status="completed"}
Use cases:
- Job completion rate:
rate(fovea_queue_job_submitted{status="completed"}[5m])
- Job failure rate:
rate(fovea_queue_job_submitted{status="failed"}[5m])
- Queue throughput monitoring
fovea_queue_job_duration
Type: Histogram
Description: Job processing duration in milliseconds
Labels:
queue
- Queue namestatus
- Job status
Example:
fovea_queue_job_duration{queue="video-summarization",status="completed"}
Use cases:
- Average job duration:
rate(fovea_queue_job_duration_sum[5m]) / rate(fovea_queue_job_duration_count[5m])
- Job processing time trends
- Identify slow job types
Model Service Metrics
fovea_model_service_requests
Type: Counter
Description: Number of requests to model service
Labels:
endpoint
- API endpoint (/api/summarize, /api/detect, etc.)status
- HTTP status code
Example:
fovea_model_service_requests{endpoint="/api/summarize",status="200"}
Use cases:
- Request rate by endpoint:
rate(fovea_model_service_requests[5m])
- Error rate:
rate(fovea_model_service_requests{status=~"5.."}[5m])
- Endpoint usage patterns
fovea_model_service_duration
Type: Histogram
Description: Model service response time in milliseconds
Labels:
endpoint
- API endpointstatus
- HTTP status code
Example:
fovea_model_service_duration{endpoint="/api/summarize",status="200"}
Use cases:
- p95 inference latency:
histogram_quantile(0.95, rate(fovea_model_service_duration_bucket[5m]))
- Average response time:
rate(fovea_model_service_duration_sum[5m]) / rate(fovea_model_service_duration_count[5m])
- Endpoint performance comparison
Auto-Instrumented Metrics
These metrics are automatically generated by OpenTelemetry instrumentation.
fovea_http_server_duration_milliseconds
Type: Histogram
Description: Inbound HTTP request duration
Labels:
http_method
- HTTP methodhttp_route
- Route patternhttp_status_code
- Status code
Use cases:
- Server-side request latency
- HTTP performance monitoring
- Route-specific timing
fovea_http_client_duration_milliseconds
Type: Histogram
Description: Outbound HTTP request duration
Labels:
http_method
- HTTP methodhttp_status_code
- Status codenet_peer_name
- Destination hostnet_peer_port
- Destination port
Use cases:
- External dependency latency
- Identify slow external services
- Network performance monitoring
fovea_db_query_count
Type: Counter
Description: Database query counter
Labels:
operation
- Query operation (SELECT, INSERT, UPDATE, DELETE)table
- Database table name
Note: Available when Prisma instrumentation is enabled.
Use cases:
- Query frequency by table
- Operation type distribution
- Database access patterns
fovea_db_query_duration
Type: Histogram
Description: Database query execution time
Note: Available when Prisma instrumentation is enabled.
Use cases:
- Slow query detection
- Database performance monitoring
- Query optimization
Example PromQL Queries
API Performance
Total request rate:
sum(rate(fovea_api_requests_total[5m]))
Error rate percentage:
100 * sum(rate(fovea_api_requests_total{status=~"5.."}[5m])) / sum(rate(fovea_api_requests_total[5m]))
P50, P95, P99 latency:
histogram_quantile(0.50, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
histogram_quantile(0.95, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
histogram_quantile(0.99, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
Requests per endpoint:
sum by (route) (rate(fovea_api_requests_total[5m]))
Queue Health
Job completion rate:
rate(fovea_queue_job_submitted{status="completed"}[5m])
Job failure rate:
rate(fovea_queue_job_submitted{status="failed"}[5m])
Average job duration by queue:
sum by (queue) (rate(fovea_queue_job_duration_sum[5m])) / sum by (queue) (rate(fovea_queue_job_duration_count[5m]))
Job failure percentage:
100 * sum(rate(fovea_queue_job_submitted{status="failed"}[5m])) / sum(rate(fovea_queue_job_submitted[5m]))
Model Service
Inference request rate:
sum(rate(fovea_model_service_requests[5m]))
Average inference time:
rate(fovea_model_service_duration_sum[5m]) / rate(fovea_model_service_duration_count[5m])
P95 inference latency by endpoint:
histogram_quantile(0.95, sum by (endpoint, le) (rate(fovea_model_service_duration_bucket[5m])))
Recording Rules
Recording rules precompute frequently used queries for better performance.
Example recording rules (prometheus.yml
):
groups:
- name: fovea_recording_rules
interval: 10s
rules:
- record: job:fovea_api_request_rate:5m
expr: sum(rate(fovea_api_requests_total[5m]))
- record: job:fovea_api_error_rate:5m
expr: sum(rate(fovea_api_requests_total{status=~"5.."}[5m]))
- record: job:fovea_api_latency_p95:5m
expr: histogram_quantile(0.95, rate(fovea_api_request_duration_milliseconds_bucket[5m]))
- record: job:fovea_queue_completion_rate:5m
expr: sum by (queue) (rate(fovea_queue_job_submitted{status="completed"}[5m]))
Use recording rules in queries:
job:fovea_api_request_rate:5m
job:fovea_api_error_rate:5m
Metric Retention
Default Prometheus retention: 15 days
To adjust retention, edit prometheus.yml
:
prometheus:
command:
- '--storage.tsdb.retention.time=30d'
Troubleshooting Metrics
Metrics Not Appearing
Check:
- Service is exporting:
docker compose logs backend | grep -i otel
- OTEL Collector running:
docker compose ps otel-collector
- Metrics endpoint:
curl http://localhost:8889/metrics | grep fovea
- Prometheus targets: http://localhost:9090/targets
Metrics Not Updating
Check:
- Export interval: Metrics export every 60 seconds
- Generate traffic: Make API requests
- Scrape interval: Prometheus scrapes every 15 seconds
- Time range: Ensure Grafana time range includes recent data
High Cardinality Warnings
If you see warnings about high cardinality:
- Limit label values
- Use recording rules for aggregated queries
- Increase Prometheus memory limits
- Consider sampling for high-volume metrics
Next Steps
- Grafana Dashboards: Visualize metrics
- Overview: Monitoring stack overview
- Common Tasks: Daily operations