Skip to content

Telemetry

DeepIntShield provides built-in telemetry and monitoring capabilities through Prometheus metrics collection. The telemetry system tracks both HTTP-level performance metrics and upstream provider interactions, giving you complete visibility into your AI gateway’s performance and usage patterns.

Key Features:

  • Prometheus Integration - Native metrics collection at /metrics endpoint
  • Comprehensive Tracking - Success/error rates, token usage, costs, and cache performance
  • Custom Labels - Configurable dimensions for detailed analysis
  • Dynamic Headers - Runtime label injection via x-bf-prom-* headers
  • Cost Monitoring - Real-time tracking of AI provider costs in USD
  • Cache Analytics - Direct and semantic cache hit tracking
  • Async Collection - Zero-latency impact on request processing
  • Multi-Level Tracking - HTTP transport + upstream provider metrics

The telemetry plugin operates asynchronously to ensure metrics collection doesn’t impact request latency or connection performance.


These metrics track all incoming HTTP requests to DeepIntShield:

MetricTypeDescription
http_requests_totalCounterTotal number of HTTP requests
http_request_duration_secondsHistogramDuration of HTTP requests
http_request_size_bytesHistogramSize of incoming HTTP requests
http_response_size_bytesHistogramSize of outgoing HTTP responses

Labels:

  • path: HTTP endpoint path
  • method: HTTP verb (e.g., GET, POST, PUT, DELETE)
  • status: HTTP status code
  • custom labels: Custom labels configured in the DeepIntShield configuration

These metrics track requests forwarded to AI providers:

MetricTypeDescriptionLabels
bifrost_upstream_requests_totalCounterTotal requests forwarded to upstream providersBase Labels, custom labels
bifrost_success_requests_totalCounterTotal successful requests to upstream providersBase Labels, custom labels
bifrost_error_requests_totalCounterTotal failed requests to upstream providersBase Labels, reason, custom labels
bifrost_upstream_latency_secondsHistogramLatency of upstream provider requestsBase Labels, is_success, custom labels
bifrost_input_tokens_totalCounterTotal input tokens sent to upstream providersBase Labels, custom labels
bifrost_output_tokens_totalCounterTotal output tokens received from upstream providersBase Labels, custom labels
bifrost_cache_hits_totalCounterTotal cache hits by type (direct/semantic)Base Labels, cache_type, custom labels
bifrost_cost_totalCounterTotal cost in USD for upstream provider requestsBase Labels, custom labels

Base Labels:

  • provider: AI provider name (e.g., openai, anthropic, azure)
  • model: Model name (e.g., gpt-4o-mini, claude-3-sonnet)
  • method: Request type (chat, text, embedding, speech, transcription)
  • virtual_key_id: Virtual key ID
  • virtual_key_name: Virtual key name
  • routing_engines_used: Comma-separated routing engines used (“routing-rule”, “governance”, “loadbalancing”)
  • routing_rule_id: Routing rule ID that matched the request
  • routing_rule_name: Routing rule name that matched the request
  • selected_key_id: Selected key ID
  • selected_key_name: Selected key name
  • number_of_retries: Number of retries
  • fallback_index: Fallback index (0 for first attempt, 1 for second attempt, etc.)
  • custom labels: Custom labels configured in the DeepIntShield configuration

These metrics capture latency characteristics specific to streaming responses:

MetricTypeDescriptionLabels
bifrost_stream_first_token_latency_secondsHistogramTime from request start to first streamed tokenBase Labels
bifrost_stream_inter_token_latency_secondsHistogramLatency between subsequent streamed tokensBase Labels

Track the success rate of requests to different providers:

# Success rate by provider
rate(bifrost_success_requests_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100

Monitor token consumption across different models:

# Input tokens per minute by model
increase(bifrost_input_tokens_total[1m])
# Output tokens per minute by model
increase(bifrost_output_tokens_total[1m])
# Token efficiency (output/input ratio)
rate(bifrost_output_tokens_total[5m]) /
rate(bifrost_input_tokens_total[5m])

Monitor spending across providers and models:

# Cost per second by provider
sum by (provider) (rate(bifrost_cost_total[1m]))
# Daily cost estimate
sum by (provider) (increase(bifrost_cost_total[1d]))
# Cost per request by provider and model
sum by (provider, model) (rate(bifrost_cost_total[5m])) /
sum by (provider, model) (rate(bifrost_upstream_requests_total[5m]))

Track cache effectiveness:

# Cache hit rate by type
rate(bifrost_cache_hits_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100
# Direct vs semantic cache hits
sum by (cache_type) (rate(bifrost_cache_hits_total[5m]))

Monitor error patterns:

# Error rate by provider
rate(bifrost_error_requests_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100
# Errors by model
sum by (model) (rate(bifrost_error_requests_total[5m]))

Configure custom Prometheus labels to add dimensions for filtering and analysis:

Prometheus Labels

  1. Navigate to Configuration

    • Open DeepIntShield UI at http://localhost:8080
    • Go to Config tab
  2. Prometheus Labels

    Custom Labels: team, environment, organization, project

Add custom label values at runtime using x-bf-prom-* headers:

Terminal window
# Add custom labels to specific requests
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-prom-team: engineering" \
-H "x-bf-prom-environment: production" \
-H "x-bf-prom-organization: my-org" \
-H "x-bf-prom-project: my-project" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Header Format:

  • Prefix: x-bf-prom-
  • Label name: Any string after the prefix
  • Value: String value for the label

For local development and testing, use the provided Docker Compose setup:

Terminal window
# Navigate to telemetry plugin directory
cd plugins/telemetry
# Start Prometheus and Grafana
docker-compose up -d
# Access endpoints
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
# DeepIntShield metrics: http://localhost:8080/metrics

You can use the Prometheus scraping endpoint to create your own Grafana dashboards. Given below are few examples created using the Docker Compose setup.

Grafana Dashboard

For production environments:

  1. Deploy Prometheus with proper persistence, retention, and security
  2. Configure scraping to target your DeepIntShield instances at /metrics
  3. Set up Grafana with authentication and dashboards
  4. Configure alerts based on your SLA requirements

Prometheus Scrape Configuration:

scrape_configs:
- job_name: "deepintshield-gateway"
static_configs:
- targets: ["deepintshield-instance-1:8080", "deepintshield-instance-2:8080"]
scrape_interval: 30s
metrics_path: /metrics
# If DeepIntShield auth is enabled, add:
# basic_auth:
# username: '<admin_username>'
# password: '<admin_password>'

Configure alerts for critical scenarios using the new metrics:

High Error Rate Alert:

- alert: DeepIntShieldHighErrorRate
expr: sum by (provider) (rate(bifrost_error_requests_total[5m])) / sum by (provider) (rate(bifrost_upstream_requests_total[5m])) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected for provider {{ $labels.provider }} ({{ $value | humanizePercentage }})"

High Cost Alert:

- alert: DeepIntShieldHighCosts
expr: sum by (provider) (increase(bifrost_cost_total[1d])) > 100 # $100/day threshold
for: 10m
labels:
severity: warning
annotations:
summary: "Daily cost for provider {{ $labels.provider }} exceeds $100 ({{ $value | printf \"%.2f\" }})"

Cache Performance Alert:

- alert: DeepIntShieldLowCacheHitRate
expr: sum by (provider) (rate(bifrost_cache_hits_total[15m])) / sum by (provider) (rate(bifrost_upstream_requests_total[15m])) < 0.1
for: 5m
labels:
severity: info
annotations:
summary: "Cache hit rate for provider {{ $labels.provider }} below 10% ({{ $value | humanizePercentage }})"