Skip to content

Vertex AI

Vertex AI is Google’s unified ML platform providing access to Google’s Gemini models, Anthropic Claude models, and other third-party LLMs through a single API. DeepIntShield performs conversions including:

  • Multi-model support - Unified interface for Gemini, Anthropic, and third-party models
  • OAuth2 authentication - Service account credentials with automatic token refresh
  • Project and region management - Automatic endpoint construction from GCP project/region
  • Model routing - Automatic provider detection (Gemini vs Anthropic) based on model name
  • Request conversion - Conversion to underlying provider format (Gemini or Anthropic)
  • Embeddings support - Vector generation with task type and truncation options
  • Model discovery - Paginated model listing with deployment information
OperationNon-StreamingStreamingEndpoint
Chat Completions/generate
Responses API/messages
Embeddings-/embeddings
Image Generation-/generateContent or /predict (Imagen)
Image Edit-/generateContent or /predict (Imagen)
Video Generation-/predictLongRunning (Veo models only)
Image Variation-Not supported
List Models-/models
Text Completions-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-

ParameterVertex HandlingNotes
modelMaps to Vertex model IDRegion-specific endpoint constructed automatically
All other paramsModel-specific conversionConverted per underlying provider (Gemini/Anthropic)

The key configuration for Vertex requires Google Cloud credentials:

{
"vertex_key_config": {
"project_id": "my-gcp-project",
"region": "us-central1",
"auth_credentials": "{service-account-json}"
}
}

Configuration Details:

  • project_id - GCP project ID (required)
  • region - GCP region for API endpoints (required)
    • Examples: us-central1, us-west1, eu-west1, global
  • auth_credentials - Service account JSON credentials (optional if using default credentials)
  1. Service Account JSON (recommended for production)

    {"auth_credentials": "{full-service-account-json}"}
  2. Application Default Credentials (for local development)

    • Requires GOOGLE_APPLICATION_CREDENTIALS environment variable
    • Leave auth_credentials empty

When using Google’s Gemini models, DeepIntShield converts requests to Gemini’s API format.

All Gemini-compatible parameters are supported. Special handling includes:

  • System prompts: Converted to Gemini’s system message format
  • Tool usage: Mapped to Gemini’s function calling format
  • Streaming: Uses Gemini’s streaming protocol

Refer to Gemini documentation for detailed conversion details.

When using Anthropic models through Vertex AI, DeepIntShield converts requests to Anthropic’s message format.

All Anthropic-standard parameters are supported:

  • Reasoning/Thinking: reasoning parameters converted to thinking structure
  • System messages: Extracted and placed in separate system field
  • Tool message grouping: Consecutive tool messages merged
  • API version: Automatically set to vertex-2023-10-16 for Anthropic models

Refer to Anthropic documentation for detailed conversion details.

  • Responses API uses special /v1/messages endpoint
  • anthropic_version automatically set to vertex-2023-10-16
  • Minimum reasoning budget: 1024 tokens
  • Model field removed from request (Vertex uses different identification)

The region determines the API endpoint:

RegionEndpointPurpose
us-central1us-central1-aiplatform.googleapis.comUS Central
us-west1us-west1-aiplatform.googleapis.comUS West
eu-west1eu-west1-aiplatform.googleapis.comEurope West
globalaiplatform.googleapis.comGlobal (no region prefix)

Availability varies by region. Check GCP documentation for model availability.

Streaming format depends on model type:

  • Gemini models: Standard Gemini streaming with server-sent events
  • Anthropic models: Anthropic message streaming format

The Responses API is available for both Anthropic (Claude) and Gemini models on Vertex AI.

ParameterVertex HandlingNotes
instructionsBecomes system messageModel-specific conversion
inputConverted to messagesString or array support
max_output_tokensModel-specific field mappingGemini vs Anthropic conversion
All other paramsModel-specific conversionConverted per underlying provider

For Gemini models, conversion follows Gemini’s Responses API format.

For Anthropic models, conversion follows Anthropic’s message format:

  • instructions becomes system message
  • reasoning mapped to thinking structure
Terminal window
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "vertex/claude-3-5-sonnet",
"input": "What is AI?",
"instructions": "You are a helpful assistant",
"project_id": "my-gcp-project",
"region": "us-central1"
}' \
-H "X-Goog-Authorization: Bearer {token}"
  • Endpoint: /v1/messages (Anthropic format)
  • anthropic_version set to vertex-2023-10-16 automatically
  • Model and region fields removed from request
  • Raw request body passthrough supported

Refer to Anthropic Responses API for parameter details.


Embeddings are supported for Gemini and other models that support embedding generation.

ParameterVertex MappingNotes
inputinstances[].contentText to embed
dimensionsparameters.outputDimensionalityOptional output size

Use extra_params for embedding-specific options:

Terminal window
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-004",
"input": ["text to embed"],
"dimensions": 256,
"task_type": "RETRIEVAL_DOCUMENT",
"title": "Document title",
"project_id": "my-gcp-project",
"region": "us-central1",
"autoTruncate": true
}'
ParameterTypeDescription
task_typestringTask type hint: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING (optional)
titlestringOptional title to help model produce better embeddings (used with task_type)
autoTruncatebooleanAuto-truncate input to max tokens (defaults to true)

Different task types optimize embeddings for specific use cases:

  • RETRIEVAL_DOCUMENT - Optimized for documents in retrieval systems
  • RETRIEVAL_QUERY - Optimized for queries searching documents
  • SEMANTIC_SIMILARITY - Optimized for semantic similarity tasks
  • CLASSIFICATION - For classification tasks
  • CLUSTERING - For clustering tasks

Embeddings response includes vectors and truncation information:

{
"embeddings": [
{
"values": [0.1234, -0.5678, ...],
"statistics": {
"token_count": 15,
"truncated": false
}
}
]
}

Response Fields:

  • values - Embedding vector as floats
  • statistics.token_count - Input token count
  • statistics.truncated - Whether input was truncated due to length

Image Generation is supported for Gemini and Imagen on Vertex AI. The provider automatically routes to the appropriate format based on the model type.

ParameterVertex HandlingNotes
modelMapped to deployment/model identifierModel type detected automatically
promptModel-specific conversionConverted per underlying provider (Gemini/Imagen)
All other paramsModel-specific conversionConverted per underlying provider

Vertex automatically detects the model type and uses the appropriate conversion:

  1. Gemini Models: Uses Gemini format (same as Gemini Image Generation)
  2. Imagen Models: Uses Imagen format (detected via IsImagenModel())
Terminal window
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "vertex/imagen-4.0-generate-001",
"prompt": "A sunset over the mountains",
"size": "1024x1024",
"n": 2,
"project_id": "my-gcp-project",
"region": "us-central1"
}' \
-H "X-Goog-Authorization: Bearer {token}"

Vertex converts requests based on model type:

  • Gemini Models: Uses gemini.ToGeminiImageGenerationRequest() - same conversion as standard Gemini (see Gemini Image Generation)
  • Imagen Models: Uses gemini.ToImagenImageGenerationRequest() - Imagen-specific format with size/aspect ratio conversion

All request bodies are converted to map[string]interface{} and the region field is removed before sending to Vertex API.

  • Gemini Models: Responses converted using GenerateContentResponse.ToDeepIntShieldImageGenerationResponse() - same as standard Gemini
  • Imagen Models: Responses converted using GeminiImagenResponse.ToDeepIntShieldImageGenerationResponse() - Imagen-specific format

The provider automatically selects the endpoint based on model type:

  • Fine-tuned models: /v1beta1/projects/{projectNumber}/locations/{region}/endpoints/{deployment}:generateContent
  • Imagen models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict
  • Gemini models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent

Image generation streaming is not supported by Vertex AI.


Image Edit is supported for Gemini and Imagen models on Vertex AI. The provider automatically routes to the appropriate format based on the model type.

Request Parameters

ParameterTypeRequiredNotes
modelstringModel identifier (must be Gemini or Imagen model)
promptstringText description of the edit
image[]binaryImage file(s) to edit (supports multiple images)
maskbinaryMask image file
typestringEdit type: "inpainting", "outpainting", "inpaint_removal", "bgswap" (Imagen only)
nintNumber of images to generate (1-10)
output_formatstringOutput format: "png", "webp", "jpeg"
output_compressionintCompression level (0-100%)
seedintSeed for reproducibility (via ExtraParams["seed"])
negative_promptstringNegative prompt (via ExtraParams["negativePrompt"])
maskModestringMask mode (via ExtraParams["maskMode"], Imagen only): "MASK_MODE_USER_PROVIDED", "MASK_MODE_BACKGROUND", "MASK_MODE_FOREGROUND", "MASK_MODE_SEMANTIC"
dilationfloatMask dilation (via ExtraParams["dilation"], Imagen only): Range [0, 1]
maskClassesint[]Mask classes (via ExtraParams["maskClasses"], Imagen only): For MASK_MODE_SEMANTIC

Request Conversion

Vertex uses the same conversion functions as Gemini:

  1. Gemini Models: Uses gemini.ToGeminiImageEditRequest() - same conversion as standard Gemini (see Gemini Image Edit)
  2. Imagen Models: Uses gemini.ToImagenImageEditRequest() - Imagen-specific format with edit mode mapping and mask configuration (see Gemini Image Edit)

Model Validation: Only Gemini and Imagen models are supported. Other models return ConfigurationError.

Request Body Processing:

  • All request bodies are converted to map[string]interface{} for Vertex API compatibility
  • The region field is removed before sending to Vertex API
  • For Gemini models, unsupported fields are stripped via stripVertexGeminiUnsupportedFields() (removes id from function_call and function_response)

Response Conversion

  • Gemini Models: Responses converted using GenerateContentResponse.ToDeepIntShieldImageGenerationResponse() - same as standard Gemini
  • Imagen Models: Responses converted using GeminiImagenResponse.ToDeepIntShieldImageGenerationResponse() - Imagen-specific format

Endpoint Selection

The provider automatically selects the endpoint based on model type:

  • Gemini models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:generateContent
  • Imagen models: /v1/projects/{projectID}/locations/{region}/publishers/google/models/{model}:predict

Streaming

Image edit streaming is not supported by Vertex AI.

Image Variation

Image variation is not supported by Vertex AI.


None required. Automatically uses project_id and region from key config.

Lists models available in the specified project and region with metadata and deployment information:

{
"models": [
{
"name": "projects/{project}/locations/{region}/models/gemini-2.0-flash",
"display_name": "Gemini 2.0 Flash",
"description": "Fast multimodal model",
"version_id": "1",
"version_aliases": ["latest", "stable"],
"capabilities": [...],
"deployed_models": [...]
}
],
"next_page_token": "..."
}

To provide a complete model listing experience, DeepIntShield performs multi-pass model discovery:

  1. First Pass - Custom Models from API Response

    • Queries Vertex AI’s List Models API
    • Returns only custom fine-tuned models deployed to your project
    • Custom models are identified by having deployment values that contain only digits
    • Example: "deployment": "1234567890"
  2. Second Pass - Non-Custom Models from Deployments

    • Adds standard foundation models from your deployments configuration
    • Non-custom models have alphanumeric deployment values (e.g., gemini-pro, claude-3-5-sonnet)
    • Filters by allowedModels if specified
    • Example: "deployment": "gemini-2.0-flash"
  3. Third Pass - Allowed Models Not in Deployments

    • Adds models specified in allowedModels that weren’t in the deployments map
    • Ensures all explicitly allowed models appear in the list
    • Uses the model name itself as the deployment value
    • Skips digit-only model IDs (reserved for custom models)
  • If allowedModels is empty: All models from all three passes are included
  • If allowedModels is non-empty: Only models/deployments with keys in allowedModels are included
  • Duplicate Prevention: Each model ID is tracked to prevent duplicates across passes

Non-custom models from deployments and allowed models are automatically formatted for display:

  • gemini-pro → “Gemini Pro”
  • claude-3-5-sonnet → “Claude 3 5 Sonnet”
  • gemini_2_flash → “Gemini 2 Flash”

Formatting uses title case and converts hyphens/underscores to spaces.

{
"vertex_key_config": {
"project_id": "my-project",
"region": "us-central1",
"deployments": {
"my-gemini-ft": "1234567890",
"my-claude-ft": "9876543210"
}
}
}

This returns only your custom fine-tuned models from the API.

Model listing is paginated automatically. If more than 100 models exist, next_page_token will be present. DeepIntShield handles pagination internally.


Project ID and Region Required

Severity: High Behavior: Both project_id and region required for all operations Impact: Request fails without valid GCP project/region configuration Code: vertex.go:127-138

OAuth2 Token Management

Severity: Medium Behavior: Tokens cached and automatically refreshed when expired Impact: First request slightly slower due to auth; cached for subsequent requests Code: vertex.go:34-55

Anthropic Model Detection

Severity: Medium Behavior: Automatic detection of Anthropic vs Gemini models Impact: Different conversion logic applied transparently Code: vertex.go chat/responses endpoints

Model-Specific Responses API Handling

Severity: Low Behavior: Responses API automatically routes to Anthropic or Gemini implementation based on model Impact: Different conversion logic applied transparently per model Code: vertex.go:836-1080

Anthropic Version Lock

Severity: Low Behavior: anthropic_version always set to vertex-2023-10-16 for Claude Impact: Cannot override Anthropic version for Claude on Vertex Code: utils.go:33, 71

Embeddings Float64 Conversion

Severity: Low Behavior: Vertex returns float64 embeddings, converted to float32 for DeepIntShield Impact: Minor precision loss (expected for embeddings) Code: embedding.go:84-87

List Models API Returns Only Custom Models

Severity: High Behavior: Vertex AI’s List Models API only returns custom fine-tuned models, NOT foundation models Impact: DeepIntShield performs three-pass discovery to include foundation models from deployments and allowedModels configuration Why: This is a Vertex AI API limitation - foundation models must be explicitly configured Code: models.go:76-217


HTTP Settings: OAuth2 authentication with automatic token refresh | Region-specific endpoints | Max Connections 5000 | Max Idle 60 seconds

Scope: https://www.googleapis.com/auth/cloud-platform

Endpoint Format: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/{resource}

Note: For global region, endpoint is https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/{resource}

Vertex AI requires project configuration, region selection, and Google Cloud authentication. For detailed instructions on setting up Vertex AI, see the quickstart guides:

See Provider-Specific Authentication - Google Vertex in the Gateway Quickstart for configuration steps using Web UI, API, or config.json.


Vertex AI routes video generation through Gemini’s Veo models using the predictLongRunning endpoint. All parameters are identical to Gemini Video Generation.

Supported Operations

OperationSupportedNotes
GeneratePOST /v1/videos
RetrieveGET /v1/videos/{id}
DownloadGET /v1/videos/{id}/content
DeleteNot supported
ListNot supported
RemixNot supported