Skip to content

Provider Routing

DeepIntShield offers two powerful methods for routing requests across AI providers, each serving different use cases:

  1. Governance-based Routing: Explicit, user-defined routing rules configured via Virtual Keys
  2. Adaptive Load Balancing: Automatic, performance-based routing powered by real-time metrics (Enterprise feature)

When both methods are available, governance takes precedence because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.


The Model Catalog is DeepIntShield’s central registry that tracks which models are available from which providers. It powers both governance-based routing and adaptive load balancing by maintaining an up-to-date mapping of models to providers.

The Model Catalog combines two data sources to maintain a comprehensive and up-to-date model registry:

  1. Pricing Data (Primary source)

    • Downloaded from a remote URL (configurable, defaults to https://getbifrost.ai/datasheet)
    • Contains model names, pricing tiers, and provider mappings
    • Synced to database on startup and refreshed periodically (default: every 24 hours)
    • Used for cost calculation and initial model-to-provider mapping
    • Stored as: In-memory map pricingData[model|provider|mode] for O(1) lookups
  2. Provider List Models API (Secondary source)

    • Calls each provider’s /v1/models endpoint during startup
    • Enriches the catalog with provider-specific models and aliases
    • Re-fetched when providers are added/updated via API or dashboard
    • Adds models that may not be in pricing data yet (e.g., newly released models)
    • Stored as: In-memory map modelPool[provider][]models

DeepIntShield uses a sophisticated multi-step process to determine if a model is available for a provider:

GetModelsForProvider(provider)

Purpose: Find all models available for a specific provider

Lookup Process:

  1. Check modelPool[provider] for direct matches
  2. Return all models in that provider’s slice

Example:

models := GetModelsForProvider("openai")
// Returns: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo", ...]

Used by:

  • Routing Methods to validate allowed_models
  • Dashboard model selector dropdowns
  • API responses for /v1/models?provider=openai
GetProvidersForModel(model)

Purpose: Find all providers that support a specific model

Lookup Process:

  1. Direct lookup: Check each provider’s model list in modelPool
  2. Cross-provider resolution: Apply special handling for proxy providers

Special Cross-Provider Rules:

If model is not found directly, check if provider/model exists in OpenRouter

// Request: claude-3-5-sonnet
// Checks: openrouter models for "anthropic/claude-3-5-sonnet"
// Result: Adds "openrouter" to providers list

If model is not found directly, check if provider/model exists in Vertex

// Request: claude-3-5-sonnet
// Checks: vertex models for "anthropic/claude-3-5-sonnet"
// Result: Adds "vertex" to providers list

For GPT models, check if openai/model exists in Groq

gpt-3.5-turbo
// Checks: groq models for "openai/gpt-3.5-turbo"
// Result: Adds "groq" to providers list

For Claude models, check Bedrock with flexible matching

// Request: claude-3-5-sonnet
// Checks: bedrock models containing "claude-3-5-sonnet"
// Matches: "anthropic.claude-3-5-sonnet-20240620-v1:0"
// Result: Adds "bedrock" to providers list

Example:

providers := GetProvidersForModel("claude-3-5-sonnet")
// Returns: ["anthropic", "vertex", "bedrock", "openrouter"]
// Even though the request was just "claude-3-5-sonnet"!

Used by:

  • Load balancing to find candidate providers
  • Fallback generation
  • Model validation in requests
Pricing Lookup with Fallbacks

Purpose: Get pricing data for cost calculation and model validation

Lookup Key: model|provider|mode (e.g., gpt-4o|openai|chat)

Fallback Chain:

  1. Primary lookup: model|provider|requestType
  2. Gemini → Vertex: If Gemini not found, try Vertex with same model
  3. Vertex format stripping: For provider/model, strip prefix and retry
  4. Bedrock prefix handling: For Claude models, try with anthropic. prefix
  5. Responses → Chat: If Responses mode not found, try Chat mode

Example Flow:

// Request: claude-3-5-sonnet on Gemini (Responses API)
// 1. Try: claude-3-5-sonnet|gemini|responses → Not found
// 2. Try: claude-3-5-sonnet|vertex|responses → Not found
// 3. Try: claude-3-5-sonnet|vertex|chat → ✅ Found!
// Pricing returned from vertex/chat mode

Used by:

  • Cost calculation for billing
  • Model validation during routing
  • Budget enforcement
Initial Sync (Startup)

When DeepIntShield starts, it performs a complete model catalog initialization:

Step-by-step process (from server.go:Bootstrap()):

// 1. Download from URL
pricingData := loadPricingFromURL(ctx)
// 2. Store in database (if configStore available)
configStore.CreateModelPrices(ctx, pricingData)
// 3. Load into memory cache
mc.pricingData = map[string]TableModelPricing{...}
// Build modelPool from pricing data
mc.populateModelPoolFromPricingData()
// Result: modelPool[provider] = [models from pricing]
// Call ListAllModels for all configured providers
modelData, err := client.ListAllModels(ctx, nil)
// Add results to model pool
mc.AddModelDataToPool(modelData)
// Result: modelPool enriched with provider-specific models

If list models API fails for a provider:

{"level":"warn","message":"failed to list models for provider ollama: connection refused"}
  • Logged as warning, does not stop startup
  • Provider remains usable with models from pricing data
  • Can be manually refreshed later via API

Result: DeepIntShield is ready with a comprehensive model catalog combining both sources.

Ongoing Sync (Background)

While DeepIntShield is running, the catalog stays up-to-date through background workers:

Pricing Data Sync:

  • Background worker runs every 1 hour (ticker interval)
  • Checks if 24 hours have elapsed since last sync (configurable)
  • If yes, downloads fresh pricing data and updates database + memory cache
  • Timer resets after successful sync

List Models API Sync: Triggered by these events:

  1. Provider Added: When a new provider is configured

    Terminal window
    POST /api/v1/providers
    # Automatically calls ListModels for the new provider
  2. Provider Updated: When provider config changes (keys, endpoints, etc.)

    Terminal window
    PUT /api/v1/providers/\{provider\}
    # Refetches models to detect changes
  3. Manual Refresh: Via API endpoint

    Terminal window
    POST /api/v1/providers/\{provider\}/models/refetch
    # Explicitly refetches models for a provider
  4. Manual Delete + Refetch: Clear and reload models

    Terminal window
    DELETE /api/v1/providers/\{provider\}/models
    POST /api/v1/providers/\{provider\}/models/refetch
    # Useful when models are out of sync

Failure Handling:

  • Pricing URL fails but database has data → Use cached database records
  • Pricing URL fails and no database data → Error logged, existing memory cache retained
  • List models API fails → Log warning, retain existing model pool entries
Fallback Strategy

DeepIntShield’s multi-layered approach ensures high availability:

Layer 1: Pricing Data Persistence

URL fails → Database → Memory cache → Continue operation

Layer 2: Model Pool Redundancy

ListModels fails → Pricing data models → Continue with reduced catalog

Layer 3: Runtime Validation

Model not in catalog → Special cross-provider rules → May still work

Example Scenario:

Situation:
- Pricing URL is down
- OpenAI ListModels API is down
- User requests gpt-4o on OpenAI
DeepIntShield's Response:
1. ✅ Pricing data available from database (last sync 12h ago)
2. ✅ Model pool has gpt-4o from previous ListModels call
3. ✅ Request proceeds normally
4. 📊 Cost calculated from cached pricing data

This design ensures requests never fail due to sync issues as long as one data source is available.

The allowed_models field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.

Configuration:

{
"provider_configs": [
{
"provider": "openai",
"allowed_models": [], // Empty = defer to catalog
"weight": 1.0
}
]
}

Behavior:

  • DeepIntShield calls GetModelsForProvider("openai")
  • Returns all models in modelPool["openai"]
  • Request validated against catalog

Examples:

Terminal window
# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'
# ❌ Rejected (not in OpenAI catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'

Use Cases:

  • Default behavior for most deployments
  • Automatically stays up-to-date with provider’s model offerings
  • No manual model list maintenance required

When a Virtual Key has provider_configs, governance uses the model catalog for validation:

Empty allowed_models Example:

{
"provider_configs": [
{
"provider": "openai",
"allowed_models": [], // Use catalog
"weight": 0.5
}
]
}

Request Flow:

Terminal window
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
# 1. Governance checks: Is "gpt-4o" in GetModelsForProvider("openai")?
# 2. Catalog lookup: modelPool["openai"] contains "gpt-4o" ✅
# 3. Validation passes, provider selected
# 4. Model becomes: "openai/gpt-4o"

Rejection Example:

Terminal window
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'
# 1. Governance checks: Is "claude-3-5-sonnet" in GetModelsForProvider("openai")?
# 2. Catalog lookup: modelPool["openai"] does NOT contain "claude-3-5-sonnet" ❌
# 3. Validation fails, request rejected
# 4. Error: "model not allowed for any configured provider"

Governance-based routing allows you to explicitly define which providers and models should handle requests for a specific Virtual Key. This method provides precise control over routing decisions.

When a Virtual Key has provider_configs defined:

  1. Request arrives with a Virtual Key (e.g., x-bf-vk: vk-prod-main)
  2. Model validation: DeepIntShield checks if the requested model is allowed for any configured provider
  3. Provider filtering: Providers are filtered based on:
    • Model availability in allowed_models
    • Budget limits (current usage vs max limit)
    • Rate limits (tokens/requests per time window)
  4. Weighted selection: A provider is selected using weighted random distribution
  5. Provider prefix added: Model string becomes provider/model (e.g., openai/gpt-4o)
  6. Fallbacks created: Remaining providers sorted by weight (descending) are added as fallbacks
{
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
"weight": 0.3,
"budget": {
"max_limit": 100.0,
"current_usage": 45.0
}
},
{
"provider": "azure",
"allowed_models": ["gpt-4o"],
"weight": 0.7,
"rate_limit": {
"token_max_limit": 100000,
"token_reset_duration": "1m"
}
}
]
}
Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": [...]}'
  • OpenAI: ✅ Has gpt-4o in allowed_models, budget OK, weight 0.3
  • Azure: ✅ Has gpt-4o in allowed_models, rate limit OK, weight 0.7
  • 70% chance → Azure
  • 30% chance → OpenAI
{
"model": "azure/gpt-4o",
"messages": [...],
"fallbacks": ["openai/gpt-4o"]
}
FeatureDescription
Explicit ControlDefine exactly which providers and models are accessible
Budget EnforcementAutomatically exclude providers exceeding budget limits
Rate Limit ProtectionSkip providers that have hit rate limits
Weighted DistributionControl traffic distribution with custom weights
Automatic FallbacksFailed providers automatically retry with next highest weight
Cost Optimization

Assign higher weights to cheaper providers for cost-sensitive workloads:

{
"provider_configs": [
{"provider": "groq", "weight": 0.7},
{"provider": "openai", "weight": 0.3}
]
}
Environment Separation

Create different Virtual Keys for dev/staging/prod with different provider access:

{
"virtual_keys": [
{
"id": "vk-dev",
"provider_configs": [{"provider": "ollama"}]
},
{
"id": "vk-prod",
"provider_configs": [{"provider": "openai"}, {"provider": "azure"}]
}
]
}
Compliance & Data Residency

Restrict specific Virtual Keys to compliant providers:

{
"provider_configs": [
{"provider": "azure", "allowed_models": ["gpt-4o"]},
{"provider": "bedrock", "allowed_models": ["claude-3-sonnet-20240229"]}
]
}

Adaptive Load Balancing automatically optimizes routing based on real-time performance metrics. It operates at two levels to provide both macro-level provider selection and micro-level key optimization.

Why Two Levels?

Separating provider selection (direction) from key selection (route) enables:

  • Provider-level optimization: Choose the best provider for a model based on aggregate performance
  • Key-level optimization: Within that provider, choose the best API key based on individual key performance
  • Resilience: Even when provider is specified (by governance or user), key-level load balancing still optimizes which API key to use
flowchart TB
Request["Request: gpt-4o"]
subgraph Level1["Level 1: Direction (Provider Selection)"]
Cat["Model Catalog Lookup"]
Providers["Candidate Providers:<br/>openai, azure, groq"]
Filter["Filter by allowed_models<br/>and key availability"]
Score["Score by performance:<br/>error rate, latency, utilization"]
Select["Select: openai"]
end
subgraph Level2["Level 2: Route (Key Selection)"]
Keys["Available OpenAI Keys:<br/>key-1, key-2, key-3"]
KeyScore["Score each key:<br/>error rate, latency, TPM hits"]
KeySelect["Select: key-2<br/>(best performing)"]
end
Request --> Cat --> Providers --> Filter --> Score --> Select
Select --> Keys --> KeyScore --> KeySelect --> Response["Execute with<br/>openai/gpt-4o + key-2"]

When it runs: Only when the model string has no provider prefix (e.g., gpt-4o)

How it works:

  1. Model catalog lookup: Find all configured providers that support the requested model
  2. Provider filtering: Filter based on:
    • Allowed models from keys configuration
    • Keys availability for the provider
  3. Performance scoring: Calculate scores for each provider based on:
    • Error rates (50% weight)
    • Latency (20% weight, using MV-TACOS algorithm)
    • Utilization (5% weight)
    • Momentum bias (recovery acceleration)
  4. Smart selection: Choose provider using weighted random with jitter and exploration
  5. Fallbacks created: Remaining providers sorted by performance score (descending) are added as fallbacks

When it runs: Always, even when provider is already specified (by governance, user, or Level 1)

How it works:

  1. Get available keys: Fetch all keys for the selected provider
  2. Filter by configuration: Apply model restrictions from key configuration
  3. Performance scoring: Calculate score for each key based on:
    • Error rates (recent failures)
    • Latency (response time)
    • TPM hits (rate limit violations)
    • Current state (Healthy, Degraded, Failed, Recovering)
  4. Weighted random selection: Choose key with exploration (25% chance to probe recovering keys)
  5. Circuit breaker: Skip keys with zero weight (TPM hits, repeated failures)

The load balancer computes a performance score for each provider-model combination:

Score = (P_{error} \times 0.5) + (P_{latency} \times 0.2) + (P_{util} \times 0.05) - M_{momentum}
Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-d '{"model": "gpt-4o", "messages": [...]}'

Providers supporting gpt-4o: [openai, azure, groq]

  • OpenAI: Score 0.92 (low latency, 99% success rate)
  • Azure: Score 0.85 (medium latency, 98% success rate)
  • Groq: Score 0.65 (high latency recently)

OpenAI selected (highest score within jitter band)

{
"model": "openai/gpt-4o",
"messages": [...],
"fallbacks": ["azure/gpt-4o", "groq/gpt-4o"]
}
FeatureDescription
Automatic OptimizationNo manual weight tuning required
Real-time AdaptationWeights recomputed every 5 seconds based on live metrics
Circuit BreakersFailing routes automatically removed from rotation
Fast Recovery90% penalty reduction in 30 seconds after issues resolve
Health StatesRoutes transition between Healthy, Degraded, Failed, and Recovering
Smart Exploration25% chance to probe potentially recovered routes

Monitor load balancing performance in real-time:

Adaptive Load Balancing Dashboard

The dashboard shows:

  • Weight distribution across provider-model-key routes
  • Performance metrics (error rates, latency, success rates)
  • State transitions (Healthy → Degraded → Failed → Recovering)
  • Actual vs expected traffic distribution

How Governance and Load Balancing Interact

Section titled “How Governance and Load Balancing Interact”

When both methods are available in your DeepIntShield deployment, they work together in a complementary way across two levels.

flowchart TD
Start["Request: gpt-4o"]
subgraph Governance["Governance Plugin (HTTPTransportIntercept)"]
HasVK{"Has VK with<br/>provider_configs?"}
GovRoute["Provider Selection:<br/>Weighted random"]
AddPrefix["Add prefix:<br/>azure/gpt-4o"]
end
subgraph LB1["Load Balancer Level 1 (Middleware)"]
PrefixCheck{"Has provider<br/>prefix?"}
LBProvider["Provider Selection:<br/>Performance-based"]
AddLBPrefix["Add prefix:<br/>openai/gpt-4o"]
end
subgraph LB2["Load Balancer Level 2 (Key Selector)"]
GetKeys["Get available keys<br/>for selected provider"]
ScoreKeys["Score keys by<br/>performance metrics"]
SelectKey["Select best key"]
end
Start --> HasVK
HasVK -->|Yes| GovRoute --> AddPrefix
HasVK -->|No| PrefixCheck
AddPrefix --> PrefixCheck
PrefixCheck -->|Yes, skip Level 1| GetKeys
PrefixCheck -->|No| LBProvider --> AddLBPrefix --> GetKeys
GetKeys --> ScoreKeys --> SelectKey --> Execute["Execute request<br/>with selected provider + key"]
  1. HTTPTransportIntercept (Governance Plugin - Provider Level)

    • Runs first in the request pipeline
    • Checks if Virtual Key has provider_configs
    • If yes: adds provider prefix (e.g., azure/gpt-4o)
    • Result: Provider is selected by governance rules
  2. Middleware (Load Balancing Plugin - Provider Level / Direction)

    • Runs after HTTPTransportIntercept
    • Checks if model string contains ”/”
    • If yes: skips provider selection (already determined by governance or user)
    • If no: performs performance-based provider selection
    • Result: Provider prefix added if not already present
  3. KeySelector (Load Balancing - Key Level / Route)

    • Always runs during request execution in DeepIntShield core
    • Gets all keys for the selected provider
    • Filters keys based on model restrictions
    • Scores each key by performance metrics
    • Selects best key using weighted random + exploration
    • Result: Optimal key selected within the provider

Setup:

  • Virtual Key has provider_configs defined
  • No adaptive load balancing enabled

Request:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": [...]}'

Behavior:

  1. Governance applies weighted provider routing → selects Azure (70% weight)
  2. Model becomes azure/gpt-4o
  3. Standard key selection (non-adaptive) chooses an Azure key based on static weights
  4. Request forwarded to Azure with selected key
ScenarioProvider SelectionKey Selection
VK with provider_configsGovernance (weighted random)Standard or Adaptive (if enabled)
VK without provider_configs + LBLoad Balancing Level 1 (performance)Load Balancing Level 2 (performance)
No VK + LBLoad Balancing Level 1 (performance)Load Balancing Level 2 (performance)
Model with provider prefix + LBSkip (already specified)Load Balancing Level 2 (performance) ✅
No Load Balancing enabledGovernance or User or Model CatalogStandard (static weights)

Routing Rules (Dynamic Expression-Based Routing)

Section titled “Routing Rules (Dynamic Expression-Based Routing)”

Routing Rules provide sophisticated, expression-based control over request routing using CEL expressions. Unlike governance routing (static weights), routing rules evaluate conditions dynamically at request time.

flowchart TD
Start["Request: model + provider"]
subgraph Rules["1. Routing Rules Layer (Evaluated First)"]
RuleMatch{"CEL Expression<br/>Matches?"}
RuleDecision["Override:<br/>New provider/model/fallbacks"]
NoMatch["No match:<br/>Continue to Governance"]
end
subgraph Gov["2. Governance Layer (if no routing rule matched)"]
VKValidation["Virtual Key Validation"]
GovRouting["Provider Governance Routing<br/>(weighted random)"]
end
subgraph LB["3. Load Balancing Layer"]
LB1["Level 1: Provider Selection"]
LB2["Level 2: Key Selection"]
end
Start --> RuleMatch
RuleMatch -->|Yes| RuleDecision --> LB1
RuleMatch -->|No| NoMatch --> VKValidation --> GovRouting --> LB1
LB1 --> LB2 --> Execute["Execute with<br/>selected provider + key"]
  1. Routing rules evaluate first in scope precedence order (VirtualKey → Team → Customer → Global)
  2. If a routing rule matches: provider/model/fallbacks are overridden, governance provider_configs are skipped
  3. If no routing rule matches: governance provider selection runs (weighted random)
  4. Load balancing Level 1: skipped if provider already determined (has ”/” prefix)
  5. Load balancing Level 2 (key selection): always runs to select the best key within the determined provider

Routing rules access request context through CEL variables:

// Request context
model // Requested model
provider // Current provider
// Headers and parameters (case-insensitive)
headers["x-tier"] // Request header
params["region"] // Query parameter
// Organization context
virtual_key_id // VirtualKey ID
team_name // Team name
customer_id // Customer ID
// Capacity metrics (0-100 percentage)
budget_used // Budget usage %
tokens_used // Token rate limit usage %
request // Request rate limit usage %
headers["x-tier"] == "premium" // → openai/gpt-4o
budget_used > 85 // → groq/llama-2 (cheaper)
team_name == "ml-research" // → anthropic/claude-3-opus
headers["x-environment"] == "production" &&
tokens_used < 75 &&
team_name == "ai-platform" // → openai/gpt-4o

Rules are evaluated in organizational precedence order (first-match-wins):

1. VirtualKey scope (highest priority)
2. Team scope
3. Customer scope
4. Global scope (lowest priority)

Within each scope, rules are sorted by priority (ascending: 0 before 10).

FeatureDescription
CEL ExpressionsPowerful, composable condition language with multiple operators
Scope HierarchyRules at VirtualKey/Team/Customer/Global levels with proper precedence
Dynamic OverrideOverride provider and/or model based on runtime conditions
Fallback ChainsDefine multiple fallback providers for automatic failover
Priority OrderingLower priority evaluated first within same scope
Capacity AwarenessAccess real-time budget and rate limit usage percentages

Routing Rules execute before governance provider selection and can override it:

If a routing rule matches:

Routing Rules evaluate
Rule matches: budget_used > 85
Override: groq/llama-2 (cheaper provider)
Governance provider_configs SKIPPED
Load Balancing selects best key

If no routing rule matches:

Routing Rules evaluate
No matching rule
Governance decides: azure/gpt-4o (70% weight)
Load Balancing selects best key

Key Insight: Routing rules have higher precedence than governance provider_configs. If a routing rule matches, governance provider_configs are bypassed entirely.

Routing Rules work before load balancing:

Routing Rules decide: openai/gpt-4o
Load Balancing Level 1: Skipped (provider already determined)
Load Balancing Level 2: Selects best OpenAI key based on performance

Even when routing rules determine the provider, load balancing Level 2 still optimizes which API key to use within that provider.

  • Tier-based routing: Premium users → fast providers
  • Capacity failover: High budget usage → cheaper providers
  • Team preferences: Different teams → different providers
  • A/B testing: Route subset of traffic to test models
  • Regional routing: EU users → EU providers (data residency)
  • Complex logic: Combine multiple conditions for sophisticated routing

Routing rules can be configured through:

  • Dashboard: Visual rule builder with CEL expression editor
  • API: POST /api/governance/routing-rules and related endpoints
  • Scope: Create rules at global, customer, team, or virtual key levels
  • Priority: Order rules within scope with numeric priority

For complete documentation, see Routing Rules Documentation.


  1. Use Governance When:

    Compliance requirements: Need to ensure data stays in specific regions or providers ✅ Cost optimization: Want explicit control over traffic distribution to cheaper providers ✅ Budget enforcement: Need hard limits on spending per provider ✅ Environment separation: Different teams/apps need different provider access ✅ Rate limit management: Need to respect provider-specific rate limits

  2. Use Routing Rules When:

    Dynamic routing: Route based on runtime request context (headers, parameters) ✅ Capacity-aware routing: Switch to fallback when budget/rate limits high ✅ Organization-based routing: Different rules for teams/customers ✅ A/B testing: Route subset of traffic to test new models ✅ Complex conditions: Multiple criteria (e.g., tier + capacity + team)

  3. Use Load Balancing When:

    Performance optimization: Want automatic routing to best-performing providers ✅ Minimal configuration: Prefer hands-off operation with intelligent defaults ✅ Dynamic workloads: Traffic patterns change frequently ✅ Automatic failover: Need instant adaptation to provider issues ✅ Multi-provider redundancy: Want seamless provider switching based on availability

  4. Use All Three Together:

    Complete solution: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys ✅ Maximum flexibility: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing) ✅ Enterprise deployments: Complex organizations with multiple requirements per layer


Governance Routing

Configuration instructions for setting up governance routing via Virtual Keys (Web UI, API, config.json)

Open →

Routing Rules

Dynamic, expression-based routing using CEL expressions for runtime conditions

Open →

Adaptive Load Balancing

Technical implementation details: scoring algorithms, weight calculations, and performance characteristics

Open →

Virtual Keys

Learn how to create and configure Virtual Keys

Open →

Fallbacks

Understand how automatic fallbacks work across providers

Open →