Google Gemini

Overview

Google Gemini’s API has different structure from OpenAI. DeepIntShield performs extensive conversion including:

Role remapping - “assistant” → “model”, system messages integrated into main flow
Message grouping - Consecutive tool responses merged into single user message
Parameter renaming - e.g., max_completion_tokens → maxOutputTokens, stop → stopSequences
Function call handling - Tool call ID preservation and thought signature support
Content modality - Support for text, images, video, code execution, and thought content
Thinking/Reasoning - Thinking configuration mapped to DeepIntShield reasoning structure

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1beta/models/{model}:generateContent`
Responses API	✅	✅	`/v1beta/models/{model}:generateContent`
Speech (TTS)	✅	✅	`/v1beta/models/{model}:generateContent`
Transcriptions (STT)	✅	✅	`/v1beta/models/{model}:generateContent`
Image Generation	✅	-	`/v1beta/models/{model}:generateContent` or `/v1beta/models/{model}:predict` (Imagen)
Image Edit	✅	-	`/v1beta/models/{model}:generateContent` or `/v1beta/models/{model}:predict` (Imagen)
Video Generation	✅	-	`/v1beta/models/{model}:predictLongRunning`
Image Variation	❌	-	Not supported
Embeddings	✅	-	`/v1beta/models/{model}:embedContent`
Files	✅	-	`/upload/storage/v1beta/files`
Batch	✅	-	`/v1beta/batchJobs`
List Models	✅	-	`/v1beta/models`

Authentication

Gemini supports API key authentication in addition to OAuth2 Bearer token authentication. The implementation conditionally uses the appropriate method based on the endpoint type.

API Key Authentication

API key authentication is supported via two methods:

Header Method (standard Gemini endpoints):
- Format: x-goog-api-key: YOUR_API_KEY header
- Used for: Standard Gemini endpoints (e.g., /v1beta/models/{model}:generateContent)
Query Parameter Method (Imagen and custom endpoints):
- Format: ?key=YOUR_API_KEY appended to request URLs
- Used for: Imagen models and custom endpoints
- Example: https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-001:predict?key=YOUR_API_KEY

DeepIntShield automatically selects the appropriate authentication method based on the endpoint type.

1. Chat Completions

Request Parameters

Parameter Mapping

Parameter	Transformation
`max_completion_tokens`	Renamed to `maxOutputTokens`
`temperature`, `top_p`	Direct pass-through
`stop`	Renamed to `stopSequences`
`response_format`	Converted to `responseMimeType` and `responseJsonSchema`
`tools`	Schema restructured (see Tool Conversion)
`tool_choice`	Mapped to `functionCallingConfig` (see Tool Conversion)
`reasoning`	Mapped to `thinkingConfig` (see Reasoning / Thinking)
`top_k`	Via `extra_params` (Gemini-specific)
`presence_penalty`, `frequency_penalty`	Via `extra_params`
`seed`	Via `extra_params`

Dropped Parameters

The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, parallel_tool_calls, service_tier

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "stop_sequences": ["###"]
  }'

resp, err := client.ChatCompletionRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldChatRequest{
    Provider: schemas.Gemini,
    Model:    "gemini-2.0-flash",
    Input:    messages,
    Params: &schemas.ChatParameters{
        ExtraParams: map[string]interface{}{
            "top_k": 40,
            "stop_sequences": []string{"###"},
        },
    },
})

Reasoning / Thinking

Documentation: See DeepIntShield Reasoning Reference

Parameter Mapping

reasoning.effort → thinkingConfig.thinkingLevel (“low” → LOW, “high” → HIGH)
reasoning.max_tokens → thinkingConfig.thinkingBudget (token budget for thinking)
reasoning parameter triggers thinkingConfig.includeThoughts = true

Supported Thinking Levels

"low" / "minimal" → LOW
"medium" / "high" → HIGH
null or unspecified → Based on max_tokens: -1 (dynamic), 0 (disabled), or specific budget

Example

// Request
{"reasoning": {"effort": "high", "max_tokens": 10000}}

// Gemini conversion
{"thinkingConfig": {"includeThoughts": true, "thinkingLevel": "HIGH", "thinkingBudget": 10000}}

Message Conversion

Critical Caveats

Role remapping: “assistant” → “model”, “system” → part of user/model content flow
Consecutive tool responses: Tool response messages merged into single user message with function response parts
Content flattening: Multi-part content in single message preserved as parts array

Image Conversion

URL images: {type: "image_url", image_url: {url: "..."}} → {type: "image", source: {type: "url", url: "..."}}
Base64 images: Data URL → {type: "image", source: {type: "base64", media_type: "image/png", ...}}
Video content: Preserved with metadata (fps, start/end offset)

Tool Conversion

Tool definitions are restructured with these mappings:

function.name → functionDeclarations.name (preserved)
function.parameters → functionDeclarations.parameters (Schema format)
function.description → functionDeclarations.description
function.strict → Dropped (not supported by Gemini)

Tool Choice Mapping

OpenAI	Gemini
`"auto"`	`AUTO` (default)
`"none"`	`NONE`
`"required"`	`ANY`
Specific tool	`ANY` with `allowedFunctionNames`

Response Conversion

Field Mapping

finishReason → finish_reason:
- STOP → stop
- MAX_TOKENS → length
- SAFETY, RECITATION, LANGUAGE, BLOCKLIST, PROHIBITED_CONTENT, SPII, IMAGE_SAFETY → content_filter
- MALFORMED_FUNCTION_CALL, UNEXPECTED_TOOL_CALL → tool_calls
candidates[0].content.parts[0].text → choices[0].message.content (if single text block)
candidates[0].content.parts[].functionCall → choices[0].message.tool_calls
promptTokenCount → usage.prompt_tokens
candidatesTokenCount → usage.completion_tokens
totalTokenCount → usage.total_tokens
cachedContentTokenCount → usage.prompt_tokens_details.cached_tokens
thoughtsTokenCount → usage.completion_tokens_details.reasoning_tokens
Thought content (from text parts with thought: true) → reasoning field in stream deltas
Function call args (map) → JSON string arguments

Streaming

Event structure:

Streaming responses contain deltas in delta.content (text), delta.reasoning (thoughts), delta.toolCalls (function calls)
Function responses appear as text content in the delta
finish_reason only set on final chunk
Usage metadata only included in final chunk

2. Responses API

The Responses API uses the same underlying /generateContent endpoint but converts between OpenAI’s Responses format and Gemini’s Messages format.

Request Parameters

Parameter Mapping

Parameter	Transformation
`max_output_tokens`	Renamed to `maxOutputTokens`
`temperature`, `top_p`	Direct pass-through
`instructions`	Converted to system instruction text
`input` (string or array)	Converted to messages
`tools`	Schema restructured (see Chat Completions)
`tool_choice`	Type mapped (see Chat Completions)
`reasoning`	Mapped to `thinkingConfig` (see Reasoning / Thinking)
`text`	Maps to `responseMimeType` and `responseJsonSchema`
`stop`	Via `extra_params`, renamed to `stopSequences`
`top_k`	Via `extra_params`

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway):

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "input": "Hello, how are you?",
    "instructions": "You are a helpful assistant.",
    "top_k": 40
  }'

resp, err := client.ResponsesRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldResponsesRequest{
    Provider: schemas.Gemini,
    Model:    "gemini-2.0-flash",
    Input:    messages,
    Params: &schemas.ResponsesParameters{
        Instructions: schemas.Ptr("You are a helpful assistant."),
        ExtraParams: map[string]interface{}{
            "top_k": 40,
        },
    },
})

Input & Instructions

Input: String wrapped as user message or array converted to messages
Instructions: Becomes system instruction (single text block)

Tool Support

Supported types: function, computer_use_preview, web_search, mcp

Tool conversions same as Chat Completions with:

Computer tools auto-configured (if specified in DeepIntShield request)
Function-based tools always enabled

Response Conversion

finishReason → status: STOP/MAX_TOKENS/other → completed | SAFETY → incomplete
Output items conversion:
- Text parts → message field
- Function calls → function_call field
- Thought content → reasoning field
Usage fields preserved with cache tokens mapped to *_tokens_details.cached_tokens

Streaming

Event structure: Similar to Chat Completions streaming

content_part.added emitted for text and reasoning parts
Item IDs generated as msg_{responseID}_item_{outputIndex}

3. Speech (Text-to-Speech)

Speech synthesis uses the underlying chat generation endpoint with audio response modality.

Request Parameters

Parameter	Transformation
`input`	Text to synthesize → `contents[0].parts[0].text`
`voice`	Voice name → `generationConfig.speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName`
`response_format`	Only “wav” supported (default); auto-converted from PCM

Voice Configuration

Single Voice:

{
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "voiceConfig": {
        "prebuiltVoiceConfig": {
          "voiceName": "Chant-Female"
        }
      }
    }
  }
}

Multi-Speaker:

{
  "generationConfig": {
    "responseModalities": ["AUDIO"],
    "speechConfig": {
      "multiSpeakerVoiceConfig": {
        "speakerVoiceConfigs": [
          {
            "speaker": "Character A",
            "voiceConfig": {
              "prebuiltVoiceConfig": {
                "voiceName": "Chant-Female"
              }
            }
          }
        ]
      }
    }
  }
}

Response Conversion

Audio data extracted from candidates[0].content.parts[].inlineData
Format conversion: Gemini returns PCM audio (s16le, 24kHz, mono)
Auto-conversion: PCM → WAV when response_format: "wav" (default)
Raw audio returned if response_format is omitted or empty string

Supported Voices

Common Gemini voices include:

Chant-Female - Female voice
Chant-Male - Male voice
Additional voices depend on model capabilities

Check model documentation for complete list of supported voices.

4. Transcriptions (Speech-to-Text)

Transcriptions are implemented as chat completions with audio content and text prompts.

Request Parameters

Parameter	Transformation
`file`	Audio bytes → `contents[].parts[].inlineData`
`prompt`	Instructions → `contents[0].parts[0].text` (defaults to “Generate a transcript of the speech.”)
`language`	Via `extra_params` (if supported by model)

Audio Input Handling

Audio is sent as inline data with auto-detected MIME type:

{
  "contents": [
    {
      "parts": [
        {
          "text": "<prompt text>"
        },
        {
          "inlineData": {
            "mimeType": "audio/wav",
            "data": "<base64-encoded-audio>"
          }
        }
      ]
    }
  ]
}

Extra Parameters

Safety settings and caching can be configured:

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/audio/transcriptions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/gemini-2.0-flash",
    "file": "<binary-audio-data>",
    "prompt": "Transcribe this audio in the original language."
  }'

resp, err := client.TranscriptionRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldTranscriptionRequest{
    Provider: schemas.Gemini,
    Model:    "gemini-2.0-flash",
    Input: &schemas.TranscriptionInput{
        File: audioBytes,
    },
    Params: &schemas.TranscriptionParameters{
        Prompt: schemas.Ptr("Transcribe this audio."),
        ExtraParams: map[string]interface{}{
            "safety_settings": [...],
        },
    },
})

Response Conversion

Transcribed text extracted from candidates[0].content.parts[].text
task set to "transcribe"
Usage metadata mapped:
- promptTokenCount → input_tokens
- candidatesTokenCount → output_tokens
- totalTokenCount → total_tokens

5. Embeddings

Request Parameters:

input → requests[0].content.parts[0].text (single text joins arrays with space)
dimensions → outputDimensionality
Extra task type and title via extra_params

Response Mapping:

embeddings[].values → DeepIntShield embedding array
metadata.billableCharacterCount → Usage prompt tokens (fallback)
Token counts extracted from usage metadata

6. Batch API

Request formats: Inline requests array or file-based input

Pagination: Token-based with pageToken

Endpoints:

POST /v1beta/batchJobs - Create
GET /v1beta/batchJobs?pageSize={limit}&pageToken={token} - List
GET /v1beta/batchJobs/{batch_id} - Retrieve
POST /v1beta/batchJobs/{batch_id}:cancel - Cancel

Response Structure:

Status mapping: BATCH_STATE_PENDING/BATCH_STATE_RUNNING → in_progress, BATCH_STATE_SUCCEEDED → completed, BATCH_STATE_FAILED → failed, BATCH_STATE_CANCELLING → cancelling, BATCH_STATE_CANCELLED → cancelled, BATCH_STATE_EXPIRED → expired
Inline responses: Array in dest.inlinedResponses
File-based responses: JSONL file in dest.fileName

Note: RFC3339 timestamps converted to Unix timestamps

7. Files API

Upload: Multipart/form-data with file (binary) and filename (optional)

Field mapping:

name → id
displayName → filename
sizeBytes → size_bytes
mimeType → content_type
createTime (RFC3339) → Converted to Unix timestamp

Endpoints:

POST /upload/storage/v1beta/files - Upload
GET /v1beta/files?limit={limit}&pageToken={token} (cursor pagination)
GET /v1beta/files/{file_id} - Retrieve
DELETE /v1beta/files/{file_id} - Delete
GET /v1beta/files/{file_id}/content - Download

8. Image Generation

Gemini supports two image generation formats depending on the model:

Standard Gemini Format: Uses the /v1beta/models/{model}:generateContent endpoint
Imagen Format: Uses the /v1beta/models/{model}:predict endpoint for Imagen models (detected automatically)

Parameter Mapping

Parameter	Transformation
`prompt`	Text description of the image to generate
`n`	Number of images (mapped to `sampleCount` for Imagen, `candidateCount` for Gemini)
`size`	Image size in WxH format (e.g., `"1024x1024"`). Converted to Imagen’s `imageSize` + `aspectRatio` format
`output_format`	Output format: `"png"`, `"jpeg"`, `"webp"`. Converted to MIME type for Imagen
`seed`	Seed for reproducible generation (passed directly)
`negative_prompt`	Negative prompt (passed directly)

Extra Parameters

Use extra_params (SDK) or pass directly in request body (Gateway) for Gemini-specific fields:

Parameter	Type	Notes
`personGeneration`	string	Person generation setting (Imagen only)
`language`	string	Language code (Imagen only)
`enhancePrompt`	bool	Prompt enhancement flag (Imagen only)
`safetySettings` / `safety_settings`	string/array	Safety settings configuration
`cachedContent` / `cached_content`	string	Cached content ID
`labels`	object	Custom labels map

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini/imagen-4.0-generate-001",
    "prompt": "A sunset over the mountains",
    "size": "1024x1024",
    "n": 2,
    "output_format": "png"
  }'

resp, err := client.ImageGenerationRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldImageGenerationRequest{
    Provider: schemas.Gemini,
    Model:    "imagen-4.0-generate-001",
    Input: &schemas.ImageGenerationInput{
        Prompt: "A sunset over the mountains",
    },
    Params: &schemas.ImageGenerationParameters{
        Size:         schemas.Ptr("1024x1024"),
        N:            schemas.Ptr(2),
        OutputFormat: schemas.Ptr("png"),
    },
})

Request Conversion

Standard Gemini Format

Model mapping: bifrostReq.Model → req.Model, with bifrostReq.Input.Prompt → req.Contents[0].Parts[0].Text
Response modality: Set by deepintshield internally to generationConfig.responseModalities = ["IMAGE"] to indicate image generation
Image count: Specify number of images via n → generationConfig.candidateCount
Extra parameters: Include safetySettings, cachedContent, and labels mapped directly

Imagen Format

Prompt: bifrostReq.Prompt → req.Instances[0].Prompt
Number of Images: n → req.Parameters.SampleCount
Size Conversion: size (WxH format) converted to:
- imageSize: "1k" (if dimensions ≤ 1024), "2k" (if dimensions ≤ 2048). Sizes larger than "2k" are not supported by Imagen models.
- aspectRatio: "1:1", "3:4", "4:3", "9:16", or "16:9" (based on width/height ratio)
Output Format: output_format ("png", "jpeg") → parameters.outputOptions.mimeType ("image/png", "image/jpeg")
Seed & Negative Prompt: Passed directly to seed and parameters.negativePrompt
Extra Parameters: personGeneration, language, enhancePrompt, safetySettings mapped to parameters

Response Conversion

Standard Gemini Format

Image Data: Extracts InlineData from candidates[0].content.parts[] with MIME type image/*
Output Format: Converts MIME type (image/png, image/jpeg, image/webp) → file extension (png, jpeg, webp)
Usage: Extracts token usage from usageMetadata
Multiple Images: Each image part becomes an ImageData entry in the response array

Imagen Format

Image Data: Each prediction in response.predictions[] → ImageData with b64_json from bytesBase64Encoded
Output Format: Converts prediction.mimeType → file extension for outputFormat field (Imagen doesnt support webp)
Index: Each prediction gets an index (0, 1, 2, …) in the response array

Size Conversion

For Imagen format, size is converted between formats:

Supported Image Sizes: "1k" (≤1024), "2k" (≤2048)

Supported Aspect Ratios: "1:1", "3:4", "4:3", "9:16", "16:9"

Endpoint Selection

The provider automatically selects the endpoint based on model name:

Imagen models (detected via schemas.IsImagenModel()): Uses /v1beta/models/{model}:predict endpoint
Other models: Uses /v1beta/models/{model}:generateContent endpoint with image response modality

Streaming

Image generation streaming is not supported by Gemini.

9. Image Edit

Gemini supports image editing through two different APIs depending on the model:

Standard Gemini Format: Uses the /v1beta/models/{model}:generateContent endpoint (for Gemini models)
Imagen Format: Uses the /v1beta/models/{model}:predict endpoint (for Imagen models, detected automatically)

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Model identifier (Gemini or Imagen model)
`prompt`	string	✅	Text description of the edit
`image[]`	binary	✅	Image file(s) to edit (supports multiple images)
`mask`	binary	❌	Mask image file
`type`	string	❌	Edit type: `"inpainting"`, `"outpainting"`, `"inpaint_removal"`, `"bgswap"` (Imagen only)
`n`	int	❌	Number of images to generate (1-10)
`output_format`	string	❌	Output format: `"png"`, `"webp"`, `"jpeg"`
`output_compression`	int	❌	Compression level (0-100%)
`seed`	int	❌	Seed for reproducibility (via `ExtraParams["seed"]`)
`negative_prompt`	string	❌	Negative prompt (via `ExtraParams["negativePrompt"]`)
`guidanceScale`	int	❌	Guidance scale (via `ExtraParams["guidanceScale"]`, Imagen only)
`baseSteps`	int	❌	Base steps (via `ExtraParams["baseSteps"]`, Imagen only)
`maskMode`	string	❌	Mask mode (via `ExtraParams["maskMode"]`, Imagen only): `"MASK_MODE_USER_PROVIDED"`, `"MASK_MODE_BACKGROUND"`, `"MASK_MODE_FOREGROUND"`, `"MASK_MODE_SEMANTIC"`
`dilation`	float	❌	Mask dilation (via `ExtraParams["dilation"]`, Imagen only): Range [0, 1]
`maskClasses`	int[]	❌	Mask classes (via `ExtraParams["maskClasses"]`, Imagen only): For `MASK_MODE_SEMANTIC`

Request Conversion

Standard Gemini Format (Non-Imagen Models)

Model & Prompt: bifrostReq.Model → req.Model, bifrostReq.Input.Prompt → req.Contents[0].Parts[0].Text
Images: Each image in bifrostReq.Input.Images is converted to a Part with:
- MIME type detection (image/jpeg, image/webp, image/png) with fallback to image/png
- Base64 encoding: image.Image → Part.InlineData.Data (base64 string)
- MIME type: Part.InlineData.MIMEType
Response Modality: GenerationConfig.ResponseModalities is set to [ModalityImage] to indicate image generation
Extra Parameters: Extracted from ExtraParams:
- safetySettings / safety_settings → SafetySettings
- cachedContent / cached_content → CachedContent
- labels → Labels (map[string]string)

Imagen Format (Imagen Models)

Reference Images: Each image in bifrostReq.Input.Images is converted to ReferenceImage with:
- ReferenceType: "REFERENCE_TYPE_RAW"
- ReferenceID: Sequential IDs starting from 1
- ReferenceImage.BytesBase64Encoded: Base64-encoded image data
Mask Configuration: If Params.Mask is provided or maskMode is specified:
- Default maskMode: "MASK_MODE_USER_PROVIDED" when mask data is present
- maskMode can be overridden via ExtraParams["maskMode"]
- dilation extracted from ExtraParams["dilation"] (validated to range [0, 1])
- maskClasses extracted from ExtraParams["maskClasses"] (for MASK_MODE_SEMANTIC)
- Mask image (if provided) is base64-encoded and added as ReferenceType: "REFERENCE_TYPE_MASK"
Edit Mode Mapping: Params.Type is mapped to EditMode:
- "inpainting" → "EDIT_MODE_INPAINT_INSERTION"
- "outpainting" → "EDIT_MODE_OUTPAINT"
- "inpaint_removal" → "EDIT_MODE_INPAINT_REMOVAL"
- "bgswap" → "EDIT_MODE_BGSWAP"
- If Type is not set, editMode can be specified directly via ExtraParams["editMode"]
Parameters:
- n → Parameters.SampleCount
- output_format → Parameters.OutputOptions.MimeType (converted: "png" → "image/png", etc.)
- output_compression → Parameters.OutputOptions.CompressionQuality
- seed (via ExtraParams["seed"]) → Parameters.Seed
- negativePrompt (via ExtraParams["negativePrompt"]) → Parameters.NegativePrompt
- guidanceScale (via ExtraParams["guidanceScale"]) → Parameters.GuidanceScale
- baseSteps (via ExtraParams["baseSteps"]) → Parameters.BaseSteps
- Additional Imagen-specific parameters: addWatermark, includeRaiReason, includeSafetyAttributes, personGeneration, safetySetting, language, storageUri

Response Conversion

Standard Gemini Format: Uses the same response conversion as image generation (see Image Generation section)
Imagen Format: Uses the same response conversion as Imagen image generation (see Image Generation section)

Endpoint Selection

The provider automatically selects the endpoint based on model name:

Imagen models (detected via schemas.IsImagenModel()): Uses /v1beta/models/{model}:predict endpoint
Other models: Uses /v1beta/models/{model}:generateContent endpoint with image response modality

Streaming

Image edit streaming is not supported by Gemini.

Image Variation

Image variation is not supported by Gemini.

10. List Models

Request: GET /v1beta/models?pageSize={limit}&pageToken={token} (no body)

Field mapping:

name (remove “models/” prefix) → id (add “gemini/” prefix)
displayName → name
description → description
inputTokenLimit → max_input_tokens
outputTokenLimit → max_output_tokens
Context length = inputTokenLimit + outputTokenLimit

Pagination: Token-based with nextPageToken

11. Video Generation

Generate (`POST /v1/videos`)

Requests use JSON body (application/json).

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Veo model (e.g., `veo-3.1-generate-preview`)
`prompt`	string	✅	Text description of the video
`input_reference`	string	❌	Input image for image-to-video
`seconds`	string	❌	Duration → `durationSeconds`
`size`	string	❌	Resolution → aspect ratio (`1280x720` → `16:9`, `720x1280` → `9:16`)
`negative_prompt`	string	❌	What to avoid in the video
`seed`	int	❌	Seed for reproducibility
`audio`	bool	❌	Enable audio generation → `generateAudio`
`video_uri`	string	❌	GCS video URI for video extension

Extra Params (any unrecognized JSON field is forwarded as extra_params)

Key	Notes
`aspectRatio`	Override the aspect ratio directly (e.g., `"16:9"`, `"9:16"`). Takes precedence over `size`
`resolution`	Native Gemini resolution string
`sampleCount`	Number of samples to generate
`personGeneration`	Person generation policy
`numberOfVideos`	Number of videos to generate
`storageURI`	GCS bucket for output storage
`compressionQuality`	Output compression quality
`enhancePrompt`	Auto-enhance the prompt
`resizeMode`	How to handle size mismatches
`reference_images`	Style/asset reference image objects
`lastFrame`	Last frame image object for interpolation

Response: DeepIntShieldVideoGenerationResponse — id, status, videos[]

If Gemini filters content for safety, status is failed and content_filter describes the reason.

Job Statuses: in_progress → completed / failed

Retrieve / Download

Operation	Endpoint	Notes
Get status	`GET /v1/videos/{id}`	Polls the long-running operation
Download	`GET /v1/videos/{id}/content`	Downloads from GCS URI or decodes base64 video

Video Delete, List, and Remix are not supported.

Content Type Support

DeepIntShield supports the following content modalities through Gemini:

Content Type	Support	Notes
Text	✅	Full support
Images (URL/Base64)	✅	Converted to `{type: "image", source: {...}}`
Video	✅	With fps, start/end offset metadata
Audio	⚠️	Via file references only
PDF	✅	Via file references
Code Execution	✅	Auto-executed with results returned
Thinking/Reasoning	✅	Thought parts marked with `thought: true`
Function Calls	✅	With optional thought signatures

Caveats

Tool Response Grouping

Severity: High Behavior: Consecutive tool response messages merged into single user message Impact: Message count and structure changes Code: chat.go:627-678

Thinking Content Handling

Severity: Medium Behavior: Thought content appears as text parts with thought: true flag Impact: Requires checking thought flag to distinguish from regular text Code: chat.go:242-244, 302-304

Function Call Arguments Serialization

Severity: Low Behavior: Tool call args (object) converted to arguments (JSON string) Impact: Requires JSON parsing to access arguments Code: chat.go:101-106

Thought Signature Base64 Encoding

Severity: Low Behavior: thoughtSignature base64 URL-safe encoded, auto-converted during unmarshal Impact: Transparent to user; handled automatically Code: types.go:1048-1063

Streaming Finish Reason Timing

Severity: Medium Behavior: finish_reason only present in final stream chunk with usage metadata Impact: Cannot determine completion until end of stream Code: chat.go:206-208, 325-328

Cached Content Token Reporting

Severity: Low Behavior: Cached tokens reported in prompt_tokens_details.cached_tokens, cannot distinguish cache creation vs read Impact: Billing estimates may be approximate Code: utils.go:270-274

System Instruction Integration

Severity: Medium Behavior: System instructions become systemInstruction field (separate from messages), not included in message array Impact: Structure differs from OpenAI’s system message approach Code: responses.go:34-46

Google Gemini

Overview

Supported Operations

Authentication

API Key Authentication

1. Chat Completions

Request Parameters

Parameter Mapping

Dropped Parameters

Extra Parameters

Reasoning / Thinking

Parameter Mapping

Supported Thinking Levels

Example

Message Conversion

Critical Caveats

Image Conversion

Tool Conversion

Tool Choice Mapping

Response Conversion

Field Mapping

Streaming

2. Responses API

Request Parameters

Parameter Mapping

Extra Parameters

Input & Instructions

Tool Support

Response Conversion

Streaming

3. Speech (Text-to-Speech)

Request Parameters

Voice Configuration

Response Conversion

Supported Voices

4. Transcriptions (Speech-to-Text)

Request Parameters

Audio Input Handling

Extra Parameters

Response Conversion

5. Embeddings

6. Batch API

7. Files API

8. Image Generation

Parameter Mapping

Extra Parameters

Request Conversion

Standard Gemini Format

Imagen Format

Response Conversion

Standard Gemini Format

Imagen Format

Size Conversion

Endpoint Selection

Streaming

9. Image Edit

Standard Gemini Format (Non-Imagen Models)

Imagen Format (Imagen Models)

10. List Models

11. Video Generation

Generate (POST /v1/videos)

Retrieve / Download

Content Type Support

Caveats

Generate (`POST /v1/videos`)