Model Catalog
The Model Catalog is a foundational component of DeepIntShield that provides a unified interface for managing AI models, including their pricing, capabilities, and availability. It serves as a centralized repository for all model-related information, enabling dynamic cost calculation, intelligent model routing, and efficient resource management.
Core Features
Section titled “Core Features”1. Automatic Pricing Synchronization
Section titled “1. Automatic Pricing Synchronization”The Model Catalog manages pricing data through a two-phase approach:
Startup Behavior:
- With ConfigStore: Downloads a pricing sheet from Maxim’s datasheet, persists it to the config store, and then loads it into memory for fast lookups.
- Without ConfigStore: Downloads the pricing sheet directly into memory on every startup.
Ongoing Synchronization:
- When ConfigStore is available, an automatic sync occurs every 24 hours to keep pricing data current.
- All pricing data is cached in memory for O(1) lookup performance during cost calculations.
This ensures that cost calculations always use the latest pricing information from AI providers while maintaining optimal performance.
2. Multi-Modal Cost Calculation
Section titled “2. Multi-Modal Cost Calculation”It supports diverse pricing models across different AI operation types:
- Text Operations: Token-based pricing for chat completions, text completions, responses, and embeddings. Cache-read/cache-write pricing applies to chat/text/responses when providers surface prompt cache token details.
- Audio Processing: Character-based, token-based, and duration-based pricing for speech synthesis and transcription, with audio token detail breakdown. Speech responses populate
usage.input_charsso speech can be billed by input characters in addition to tokens/duration. - Image Processing: Per-image (
input_cost_per_image/output_cost_per_image), per-pixel (input_cost_per_pixel/output_cost_per_pixel), or token-based pricing with text/image token breakdown. - Video Processing: Token-based or duration-based pricing. Input can use prompt tokens or
input_cost_per_video_per_second; output can use completion tokens or fall back tooutput_cost_per_video_per_second/output_cost_per_second. - Reranking: Input/output token pricing with search query cost support.
- Prompt Caching: Separate rates for cache-read tokens (
cached_read_tokens) and cache-creation tokens (cached_write_tokens), both surfaced underprompt_tokens_details(see Prompt Cache Cost Calculation).
3. Model Information Management
Section titled “3. Model Information Management”The Model Catalog maintains a pool of available models for each provider, populated from both pricing data and provider list models APIs. This enables:
- Model Discovery: Listing all available models for a given provider
- Provider Discovery: Finding all providers that support a specific model with intelligent cross-provider resolution (OpenRouter, Vertex, Groq, Bedrock)
- Model Validation: Checking if a model is allowed for a provider based on allowed models lists (supports provider-prefixed entries)
4. Intelligent Cache Cost Handling
Section titled “4. Intelligent Cache Cost Handling”It integrates with semantic caching to provide accurate cost calculations:
- Cache Hits: Zero cost for direct cache hits, and embedding cost only for semantic matches.
- Cache Misses: Combined cost of the base model usage plus the embedding generation cost for cache storage.
5. Tiered Pricing Support
Section titled “5. Tiered Pricing Support”The system automatically applies different pricing rates for high-token contexts, reflecting real provider pricing models. Two tiers are supported: above 128k tokens and above 200k tokens, with the higher tier taking precedence when both are configured.
Configuration
Section titled “Configuration”The ModelCatalog can be configured during initialization by passing a Config struct.
type Config struct { PricingURL *string `json:"pricing_url,omitempty"` PricingSyncInterval *time.Duration `json:"pricing_sync_interval,omitempty"`}PricingURL: Overrides the default URL (https://getbifrost.ai/datasheet) for downloading the pricing sheet.PricingSyncInterval: Customizes the interval for periodic pricing data synchronization. The default is 24 hours.
This configuration is passed during the initialization of the ModelCatalog:
config := &modelcatalog.Config{ PricingURL: "https://my-custom-url.com/pricing.json",}modelCatalog, err := modelcatalog.Init(context.Background(), config, configStore, logger)Architecture
Section titled “Architecture”ModelCatalog
Section titled “ModelCatalog”The ModelCatalog is the central component that handles all model and pricing operations:
type ModelCatalog struct { configStore configstore.ConfigStore logger schemas.Logger
pricingURL string pricingSyncInterval time.Duration
// In-memory cache for fast access pricingData map[string]configstoreTables.TableModelPricing mu sync.RWMutex
modelPool map[schemas.ModelProvider][]string
// Background sync worker syncTicker *time.Ticker done chan struct{} wg sync.WaitGroup syncCtx context.Context syncCancel context.CancelFunc}Pricing Data Structure
Section titled “Pricing Data Structure”Each model’s pricing information includes comprehensive cost metrics, supporting various modalities and tiered pricing:
// PricingEntry represents a single model's pricing information.// The fields below are an excerpt — see framework/modelcatalog/main.go for the full definition.type PricingEntry struct { BaseModel string `json:"base_model,omitempty"` Provider string `json:"provider"` Mode string `json:"mode"`
// Costs - Text InputCostPerToken float64 `json:"input_cost_per_token"` OutputCostPerToken float64 `json:"output_cost_per_token"` InputCostPerTokenBatches *float64 `json:"input_cost_per_token_batches,omitempty"` OutputCostPerTokenBatches *float64 `json:"output_cost_per_token_batches,omitempty"` InputCostPerTokenPriority *float64 `json:"input_cost_per_token_priority,omitempty"` OutputCostPerTokenPriority *float64 `json:"output_cost_per_token_priority,omitempty"` InputCostPerTokenAbove200kTokens *float64 `json:"input_cost_per_token_above_200k_tokens,omitempty"` OutputCostPerTokenAbove200kTokens *float64 `json:"output_cost_per_token_above_200k_tokens,omitempty"`
// Costs - Cache CacheCreationInputTokenCost *float64 `json:"cache_creation_input_token_cost,omitempty"` CacheReadInputTokenCost *float64 `json:"cache_read_input_token_cost,omitempty"` CacheCreationInputTokenCostAbove200kTokens *float64 `json:"cache_creation_input_token_cost_above_200k_tokens,omitempty"` CacheReadInputTokenCostAbove200kTokens *float64 `json:"cache_read_input_token_cost_above_200k_tokens,omitempty"` CacheCreationInputTokenCostAbove1hr *float64 `json:"cache_creation_input_token_cost_above_1hr,omitempty"` CacheCreationInputTokenCostAbove1hrAbove200kTokens *float64 `json:"cache_creation_input_token_cost_above_1hr_above_200k_tokens,omitempty"` CacheCreationInputAudioTokenCost *float64 `json:"cache_creation_input_audio_token_cost,omitempty"` CacheReadInputTokenCostPriority *float64 `json:"cache_read_input_token_cost_priority,omitempty"`
// Costs - Image InputCostPerImage *float64 `json:"input_cost_per_image,omitempty"` InputCostPerPixel *float64 `json:"input_cost_per_pixel,omitempty"` OutputCostPerImage *float64 `json:"output_cost_per_image,omitempty"` OutputCostPerPixel *float64 `json:"output_cost_per_pixel,omitempty"` OutputCostPerImagePremiumImage *float64 `json:"output_cost_per_image_premium_image,omitempty"` OutputCostPerImageAbove512x512Pixels *float64 `json:"output_cost_per_image_above_512_and_512_pixels,omitempty"` OutputCostPerImageAbove512x512PixelsPremium *float64 `json:"output_cost_per_image_above_512_and_512_pixels_and_premium_image,omitempty"` OutputCostPerImageAbove1024x1024Pixels *float64 `json:"output_cost_per_image_above_1024_and_1024_pixels,omitempty"` OutputCostPerImageAbove1024x1024PixelsPremium *float64 `json:"output_cost_per_image_above_1024_and_1024_pixels_and_premium_image,omitempty"` OutputCostPerImageAbove2048x2048Pixels *float64 `json:"output_cost_per_image_above_2048_and_2048_pixels,omitempty"` OutputCostPerImageAbove4096x4096Pixels *float64 `json:"output_cost_per_image_above_4096_and_4096_pixels,omitempty"` OutputCostPerImageLowQuality *float64 `json:"output_cost_per_image_low_quality,omitempty"` OutputCostPerImageMediumQuality *float64 `json:"output_cost_per_image_medium_quality,omitempty"` OutputCostPerImageHighQuality *float64 `json:"output_cost_per_image_high_quality,omitempty"` OutputCostPerImageAutoQuality *float64 `json:"output_cost_per_image_auto_quality,omitempty"` // Costs - Audio/Video InputCostPerAudioToken *float64 `json:"input_cost_per_audio_token,omitempty"` InputCostPerAudioPerSecond *float64 `json:"input_cost_per_audio_per_second,omitempty"` InputCostPerSecond *float64 `json:"input_cost_per_second,omitempty"` InputCostPerVideoPerSecond *float64 `json:"input_cost_per_video_per_second,omitempty"` OutputCostPerAudioToken *float64 `json:"output_cost_per_audio_token,omitempty"` OutputCostPerVideoPerSecond *float64 `json:"output_cost_per_video_per_second,omitempty"` OutputCostPerSecond *float64 `json:"output_cost_per_second,omitempty"`
// Costs - Other SearchContextCostPerQuery *float64 `json:"search_context_cost_per_query,omitempty"` CodeInterpreterCostPerSession *float64 `json:"code_interpreter_cost_per_session,omitempty"`}Usage in Plugins
Section titled “Usage in Plugins”The Model Catalog is designed to be shared across all DeepIntShield plugins, providing consistent model information and validation logic for governance, load balancing, and other routing mechanisms.
Initialization
Section titled “Initialization”In DeepIntShield’s gateway, the ModelCatalog is initialized once at the start and shared across all plugins:
import "github.com/maximhq/deepintshield/framework/modelcatalog"
// Initialize model catalog with config store and loggermodelCatalog, err := modelcatalog.Init(context.Background(), &modelcatalog.Config{}, configStore, logger)if err != nil { return fmt.Errorf("failed to initialize model catalog: %w", err)}Basic Cost Calculation
Section titled “Basic Cost Calculation”Calculate costs from a DeepIntShield response:
// Calculate cost for a completed requestcost := modelCatalog.CalculateCost( result, // *schemas.DeepIntShieldResponse)
logger.Info("Request cost: $%.6f", cost)Unified Cost Calculation
Section titled “Unified Cost Calculation”CalculateCost is the single entry point for all cost calculations. It handles all request types, semantic cache billing, and tiered pricing automatically:
// CalculateCost handles all cost scenarios including cache-aware pricingcost := modelCatalog.CalculateCost(result) // *schemas.DeepIntShieldResponse
// Cache hits return 0 for direct hits, embedding cost for semantic matches// Cache misses return base model cost + embedding generation cost// Returns 0.0 if pricing data is not found (logs a debug message)Model Discovery
Section titled “Model Discovery”The ModelCatalog provides several methods to query for model and provider information.
Get Models for a Provider
Section titled “Get Models for a Provider”Retrieve a list of all models supported by a specific provider.
openaiModels := modelCatalog.GetModelsForProvider(schemas.OpenAI)for _, model := range openaiModels { logger.Info("Found OpenAI model: %s", model)}Thread-safe: Uses read lock for concurrent access.
Get Providers for a Model
Section titled “Get Providers for a Model”Find all providers that offer a specific model, including cross-provider resolution.
gpt4Providers := modelCatalog.GetProvidersForModel("gpt-4o")for _, provider := range gpt4Providers { logger.Info("gpt-4o is available from: %s", provider)}// Result: [openai, azure, groq] (includes cross-provider mappings)Cross-Provider Resolution:
This method implements intelligent cross-provider routing logic to discover all providers that can serve a model:
- Direct Match: Checks each provider’s model list in
modelPoolfor the exact model name - OpenRouter Format: For models found in other providers, checks if
provider/modelexists in OpenRouter- Example:
claude-3-5-sonnetfound in Anthropic → checks OpenRouter foranthropic/claude-3-5-sonnet
- Example:
- Vertex Format: Similar check for Vertex with
provider/modelformat - Groq OpenAI Compatibility: For GPT models, checks if
openai/modelexists in Groq’s catalog - Bedrock Claude Models: For Claude models, flexible matching against Bedrock’s full ARN format
Example:
providers := modelCatalog.GetProvidersForModel("claude-3-5-sonnet")// Returns: [anthropic, vertex, bedrock, openrouter]// Even though request was just "claude-3-5-sonnet" without provider prefix!Check Model Allowance for Provider
Section titled “Check Model Allowance for Provider”Validate if a model is allowed for a specific provider based on an allowed models list. This method is used internally by governance and load balancing plugins.
// Empty allowedModels - uses catalog to determine supportisAllowed := modelCatalog.IsModelAllowedForProvider( schemas.OpenRouter, "gpt-4o", []string{}, // empty = check catalog)// Returns: true (catalog knows OpenRouter supports openai/gpt-4o)
// Explicit allowedModels with provider prefixisAllowed := modelCatalog.IsModelAllowedForProvider( schemas.OpenRouter, "gpt-4o", []string{"openai/gpt-4o", "anthropic/claude-3-5-sonnet"},)// Returns: true (strips "openai/" prefix and matches "gpt-4o")
// Explicit allowedModels without prefixisAllowed := modelCatalog.IsModelAllowedForProvider( schemas.OpenAI, "gpt-4o", []string{"gpt-4o", "gpt-4o-mini"},)// Returns: true (direct match)Behavior:
- Empty
allowedModels: Delegates toGetProvidersForModel(includes cross-provider logic) - Non-empty
allowedModels: Checks for both direct matches and provider-prefixed entries- Direct:
"gpt-4o"matches"gpt-4o" - Prefixed:
"openai/gpt-4o"matches request for"gpt-4o"(prefix stripped)
- Direct:
Use Cases:
- Governance Routing: Validate if a model request is allowed for a provider configuration
- Load Balancing: Filter providers based on allowed models before performance scoring
- Virtual Key Validation: Check if a model can be used with a specific virtual key’s provider configs
Dynamically Add Models
Section titled “Dynamically Add Models”You can dynamically add models to the catalog’s pool from a v1/models compatible response structure. This is useful for providers that expose a model list endpoint.
// response is *schemas.DeepIntShieldListModelsResponsemodelCatalog.AddModelDataToPool(response)This is automatically done in DeepIntShield gateway initialization for all providers that are supported by DeepIntShield.
When to use:
- After fetching models from a provider’s
/v1/modelsendpoint - When a new provider is dynamically added at runtime
- For testing with custom model lists
Reloading Configuration
Section titled “Reloading Configuration”You can reload the pricing configuration at runtime if you need to change the pricing URL or sync interval.
newConfig := &modelcatalog.Config{ PricingSyncInterval: 12 * time.Hour,}err := modelCatalog.ReloadPricing(ctx, newConfig)Error Handling and Fallbacks
Section titled “Error Handling and Fallbacks”The Model Catalog handles missing pricing data gracefully with intelligent fallbacks:
// resolvePricing resolves the pricing entry for a model, trying deployment as fallback.func (mc *ModelCatalog) resolvePricing(provider, model, deployment string, requestType schemas.RequestType) *configstoreTables.TableModelPricing { pricing, exists := mc.getPricing(model, provider, requestType) if exists { return pricing } // If pricing not found for model, try the deployment name if deployment != "" { pricing, exists = mc.getPricing(deployment, provider, requestType) if exists { return pricing } } return nil}
// getPricing returns pricing information for a model (thread-safe).// It implements a multi-step fallback chain:// 1. Direct lookup by model + provider + mode// 2. Gemini → Vertex provider fallback// 3. Vertex "provider/model" prefix stripping// 4. Bedrock "anthropic." prefix addition for Claude models// 5. Responses → Chat mode fallback (at each step)// 6. ImageEdit / ImageVariation → ImageGeneration mode fallbackfunc (mc *ModelCatalog) getPricing(model, provider string, requestType schemas.RequestType) (*configstoreTables.TableModelPricing, bool) { mc.mu.RLock() defer mc.mu.RUnlock()
mode := normalizeRequestType(requestType)
pricing, ok := mc.pricingData[makeKey(model, provider, mode)] if ok { return &pricing, true }
// Provider-specific fallbacks (Gemini→Vertex, Vertex prefix strip, Bedrock anthropic. prefix) // Each fallback also tries Responses→Chat mode if applicable // ...
// Final fallback: Responses → Chat mode for any provider if requestType == schemas.ResponsesRequest || requestType == schemas.ResponsesStreamRequest { pricing, ok = mc.pricingData[makeKey(model, provider, normalizeRequestType(schemas.ChatCompletionRequest))] if ok { return &pricing, true } }
return nil, false}
// When pricing is not found, CalculateCost returns 0.0 and logs a debug message.// This ensures operations continue smoothly without billing failures.Cleanup and Lifecycle Management
Section titled “Cleanup and Lifecycle Management”Properly clean up resources when shutting down:
// Cleanup model catalog resourcesdefer func() { if err := modelCatalog.Cleanup(); err != nil { logger.Error("Failed to cleanup model catalog: %v", err) }}()Thread Safety
Section titled “Thread Safety”All ModelCatalog operations are thread-safe, making it suitable for concurrent usage across multiple plugins and goroutines. The internal pricing data cache uses read-write mutexes for optimal performance during frequent lookups.
Best Practices
Section titled “Best Practices”- Shared Instance: Use a single
ModelCataloginstance across all plugins to avoid redundant data synchronization. - Error Handling: Always handle the case where pricing returns 0.0 due to missing model data.
- Logging: Monitor pricing sync failures and missing model warnings in production.
- Cache Awareness: Use
CalculateCostwhich automatically handles cache hits/misses and embedding costs. - Resource Cleanup: Always call
Cleanup()during application shutdown to prevent resource leaks.
The Model Catalog provides a robust, production-ready foundation for implementing billing, budgeting, and cost monitoring features in DeepIntShield plugins.