Replicate

Overview

Replicate is architecturally different from other providers in DeepIntShield. It uses a prediction-based API where every request creates a “prediction” that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge.

Key Architectural Differences

Prediction-Based System: All operations create predictions via /v1/predictions or deployment endpoints
Model-Specific Inputs: Each model has its own parameter schema (use extra_params for model-specific fields)
Async/Sync Modes: Predictions can run synchronously (with Prefer: wait header) or asynchronously (with polling)
Flexible Output: Output can be strings, arrays, URLs, or data URIs depending on the model

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1/predictions`
Responses API	✅	✅	`/v1/predictions`
Text Completions	✅	✅	`/v1/predictions`
Image Generation	✅	✅	`/v1/predictions`
Image Edit	✅	✅	`/v1/predictions`
Video Generation	✅	-	`/v1/predictions`
Image Variation	❌	❌	-
Files	✅	-	`/v1/files`
List Models	✅	-	`/v1/deployments`
Embeddings	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Batch	❌	❌	-

Model Identification

Replicate models can be specified in three ways:

1. Version ID

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2. Model Name

Format: owner/model-name

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

3. Deployment

Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.

Configuration Example:

{
  "provider": "replicate",
  "value": "your-api-key",
  "replicate_key_config": {
    "deployments": {
      "my-model": "owner/my-deployment-name"
    }
  }
}

Usage:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Prediction Modes

Sync Mode

DeepIntShield uses sync mode with the Prefer: wait header if it is present in the request headers. The request blocks until the prediction completes or times out (default 60 seconds).

How it works:

Creates prediction with Prefer: wait=60 header
Replicate holds connection open for up to 60 seconds
If prediction completes within timeout, returns result immediately
If timeout expires, falls back to polling mode

Async Mode (Polling)

It is the default mode of Replicate predictions. DeepIntShield automatically polls the prediction URL every 2 seconds until completion.

Status Flow: starting → processing → succeeded/failed/canceled

1. Chat Completions

Message Conversion

System Messages: Extracted from messages array and concatenated into system_prompt field.

User/Assistant Messages: Preserved as conversation context. Text content from content blocks is concatenated with newlines.

Image Content: Non-base64 image URLs from message content blocks are extracted and passed as image_input array.

// Input
{
  "messages": [
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "Hello"}
  ]
}

// Converted to Replicate format
{
  "input": {
    "system_prompt": "You are helpful",
    "prompt": "Hello",
    "messages": [...] // Original messages array also included
  }
}

System Prompt Filtering

Important: Not all Replicate models support the system_prompt field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.

Models without system_prompt support:

meta/meta-llama-3-8b
meta/llama-2-70b
openai/gpt-oss-20b
openai/o1-mini
xai/grok-4
All deepseek-ai/deepseek* models (e.g., deepseek-r1, deepseek-v3)

Model-Specific Parameters

Use extra_params to pass model-specific parameters. These are flattened into the input object:

Gateway
Go SDK

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.7,
    "top_k": 50,
    "repetition_penalty": 1.1,
    "min_new_tokens": 10
  }'

resp, err := client.ChatCompletionRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldChatRequest{
    Provider: schemas.Replicate,
    Model:    "meta/llama-2-7b-chat",
    Input:    messages,
    Params: &schemas.ChatParameters{
        Temperature: schemas.Ptr(0.7),
        ExtraParams: map[string]interface{}{
            "top_k": 50,
            "repetition_penalty": 1.1,
            "min_new_tokens": 10,
        },
    },
})

Response Conversion

Field Mapping

Output:
- String → choices[0].message.content
- Array of strings → joined and mapped to choices[0].message.content
- Object with text field → text value mapped to choices[0].message.content
Status: succeeded → finish_reason: "stop", failed → finish_reason: "error"
Metrics: input_token_count → prompt_tokens, output_token_count → completion_tokens

Example Response

{
  "id": "abc123",
  "model": "meta/llama-2-7b-chat",
  "object": "chat.completion",
  "created": 1234567890,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Streaming

Replicate streaming uses Server-Sent Events (SSE) with the following event types:

Event Type	Description	Data Format
`output`	Content chunk	Plain text string
`done`	Completion	JSON: `{"reason": ""}` (empty = success)
`error`	Error occurred	JSON: `{"detail": "error message"}`

Streaming Flow:

DeepIntShield sets stream: true in prediction input
Replicate returns urls.stream in initial response
DeepIntShield connects to stream URL and processes SSE events
output events → content deltas
done event → final chunk with finish_reason

Done Event Reasons:

Empty or no reason = success (finish_reason: "stop")
"canceled" = prediction was canceled
"error" = prediction failed

2. Responses API

The Responses API is converted internally to Chat Completions or native Replicate format depending on the model:

// Responses request → Replicate prediction conversion
ResponsesRequest → ReplicatePredictionRequest → ReplicatePredictionResponse → DeepIntShieldResponsesResponse

Conversion Logic:

For OpenAI models with gpt-5-structured: Uses native Responses format with input_item_list, tools, and json_schema support
For all other models: Converted to Chat Completions format using message conversion logic

Same parameter mapping and system prompt handling as Chat Completions.

Response Format

Responses follow standard Responses API format with status mapping:

Replicate Status	Responses Status
`succeeded`	`completed`
`failed`	`failed`
`canceled`	`cancelled`
`processing`	`in_progress`
`starting`	`queued`

3. Text Completions (Legacy)

Conversion

Prompt array: Joined with newlines into single prompt field
top_k: Pass via extra_params (model-specific)

Example

curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b",
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.8,
    "top_k": 40
  }'

Response

Same conversion as chat completions: output string/array → choices[0].text, with usage metrics from prediction metrics.

4. Image Generation

Parameter Mapping

{
  "prompt": "prompt",
  "n": "number_of_images",
  "aspect_ratio": "aspect_ratio",
  "resolution": "resolution",
  "output_format": "output_format",
  "quality": "quality",
  "background": "background",
  "seed": "seed",
  "negative_prompt": "negative_prompt",
  "num_inference_steps": "num_inference_steps",
  "input_images": "input_images"
}

Input Image Field Mapping

Important: Different Replicate models expect input images in different fields. DeepIntShield automatically maps input_images to the correct field based on the model.

Field Mapping by Model:

Field	Models
`image_prompt`	`black-forest-labs/flux-1.1-pro` `black-forest-labs/flux-1.1-pro-ultra` `black-forest-labs/flux-pro` `black-forest-labs/flux-1.1-pro-ultra-finetuned`
`input_image`	`black-forest-labs/flux-kontext-pro` `black-forest-labs/flux-kontext-max` `black-forest-labs/flux-kontext-dev`
`image`	`black-forest-labs/flux-dev` `black-forest-labs/flux-fill-pro` `black-forest-labs/flux-dev-lora` `black-forest-labs/flux-krea-dev`
`input_images`	All other models (default)

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/black-forest-labs/flux-schnell",
    "prompt": "A serene mountain landscape at sunset",
    "aspect_ratio": "16:9",
    "output_format": "webp",
    "num_inference_steps": 4,
    "seed": 42
  }'

resp, err := client.ImageGenerationRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldImageGenerationRequest{
    Provider: schemas.Replicate,
    Model:    "black-forest-labs/flux-schnell",
    Input: &schemas.ImageGenerationInput{
        Prompt: "A serene mountain landscape at sunset",
    },
    Params: &schemas.ImageGenerationParameters{
        AspectRatio: schemas.Ptr("16:9"),
        OutputFormat: schemas.Ptr("webp"),
        NumInferenceSteps: schemas.Ptr(4),
        Seed: schemas.Ptr(42),
    },
})

Response Conversion

Replicate output can be:

Single URL: String → data[0].url
Multiple URLs: Array → data[i].url for each image
Data URIs: Base64-encoded images in data URI format

{
  "id": "xyz789",
  "created": 1234567890,
  "model": "black-forest-labs/flux-schnell",
  "data": [
    {
      "url": "https://replicate.delivery/pbxt/...",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 0,
    "total_tokens": 15
  }
}

Streaming

Image generation streaming provides progressive image updates as data URIs:

SSE Events:

output: Data URI chunk (partial image)
done: Final completion with reason
error: Error details

Flow:

Each output event contains a complete data URI (e.g., data:image/webp;base64,...)
Progressive refinement shows generation progress
done event signals completion with final image
Each chunk includes Index, ChunkIndex, and B64JSON fields

5. Image Edit

Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same input image field mapping as Image Generation applies (see Field Mapping by Model below).

Endpoint: /v1/images/edits (DeepIntShield) → Replicate /v1/predictions or deployment predictions.

Parameter Mapping

DeepIntShield / Request	Replicate input
`input.images`	Mapped to `image_prompt`, `input_image`, `image`, or `input_images` by model
`input.prompt`	`prompt`
`params.n`	`number_of_images`
`params.output_format`	`output_format`
`params.quality`	`quality`
`params.background`	`background`
`params.seed`	`seed`
`params.negative_prompt`	`negative_prompt`
`params.num_inference_steps`	`num_inference_steps`
`params.extra_params`	Merged into prediction input

Field Mapping by Model

Input images are mapped to the same fields as in Image Generation:

Field	Models
`image_prompt`	`black-forest-labs/flux-1.1-pro`, `black-forest-labs/flux-1.1-pro-ultra`, `black-forest-labs/flux-pro`, `black-forest-labs/flux-1.1-pro-ultra-finetuned`
`input_image`	`black-forest-labs/flux-kontext-pro`, `black-forest-labs/flux-kontext-max`, `black-forest-labs/flux-kontext-dev`
`image`	`black-forest-labs/flux-dev`, `black-forest-labs/flux-fill-pro`, `black-forest-labs/flux-dev-lora`, `black-forest-labs/flux-krea-dev`
`input_images`	All other models (default)

Example

Gateway
Go SDK

curl -X POST 'http://localhost:8080/v1/images/edits' \
--form 'model="replicate/black-forest-labs/flux-fill-pro"' \
--form 'image[]=@"image.png"' \
--form 'prompt="Replace the sky with a starry night"' \
--form 'mask=@"mask.png"'

resp, err := client.ImageEditRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldImageEditRequest{
    Provider: schemas.Replicate,
    Model:    "black-forest-labs/flux-fill-pro",
    Input: &schemas.ImageEditInput{
        Prompt: "Replace the sky with a starry night",
        Images: []schemas.ImageInput{
            { Image: imageBytes },
        },
    },
})

Response

Same as Image Generation: single URL → data[0].url, array of URLs → data[i].url, or data URIs. Response shape is DeepIntShieldImageGenerationResponse with data[].url or data[].b64_json.

Streaming

Image edit streaming is supported. Events use the same prediction log stream as image generation:

Partial chunks: type: "image_edit.partial_image" with b64_json (or data URI) until completion.
Completed: type: "image_edit.completed" with final image and usage.

Use Prefer: wait for sync behavior or rely on polling (async) like other Replicate predictions.

6. Files API

Replicate’s Files API supports uploading, listing, and managing files for use in predictions.

Upload

Request: Multipart form-data

Field	Type	Required	Notes
`file`	binary	✅	File content
`filename`	string	❌	Custom filename
`content_type`	string	❌	MIME type (auto-detected from extension)

Example:

curl -X POST http://localhost:8080/v1/files \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@document.pdf" \
  -F "filename=my-document.pdf"

Response:

{
  "id": "file_abc123",
  "object": "file",
  "bytes": 12345,
  "created_at": 1234567890,
  "filename": "my-document.pdf",
  "purpose": "batch",
  "status": "processed"
}

List Files

Query Parameters:

Parameter	Type	Notes
`limit`	int	Results per page
`after`	string	Pagination cursor

Example:

curl -X GET "http://localhost:8080/v1/files?limit=20" \
  -H "Authorization: Bearer $API_KEY"

Pagination: Uses cursor-based pagination with next URL in response. DeepIntShield serializes this into the after cursor.

Retrieve / Delete

Operations:

GET /v1/files/{file_id} - Retrieve file metadata
DELETE /v1/files/{file_id} - Delete file

File Content Download

Required Parameters in ExtraParams:

Parameter	Type	Description
`owner`	string	File owner username
`expiry`	int64	Unix timestamp for expiration
`signature`	string	Base64-encoded HMAC-SHA256 signature

Signature Format: HMAC-SHA256 of "{owner} {file_id} {expiry}" using Files API signing secret

Example:

curl -X POST http://localhost:8080/v1/files/file_abc123/content \
  -H "Content-Type: application/json" \
  -d '{
    "owner": "my-username",
    "expiry": 1735689600,
    "signature": "base64-encoded-signature"
  }'

7. List Models

Endpoint: /v1/models

Deployments are private or organization models with dedicated infrastructure. The response includes:

{
  "data": [
    {
      "id": "replicate/my-org/my-deployment",
      "name": "my-deployment",
      "owner": "my-org"
    }
  ],
  "has_more": false
}

Usage:

List your deployments via this endpoint
Use deployment name as model identifier: replicate/my-org/my-deployment
Predictions route to deployment-specific endpoint: /v1/deployments/my-org/my-deployment/predictions

Extra Parameters

Model-Specific Parameters

The most important feature for Replicate integration is extra_params. Parameters not in DeepIntShield’s standard schema are flattened directly into the prediction input object.

How It Works

// Request with extra params
{
  "model": "replicate/stability-ai/sdxl",
  "prompt": "A photo of an astronaut",
  "temperature": 0.7,          // Standard param
  "guidance_scale": 7.5,       // Model-specific (extra param)
  "num_inference_steps": 50,   // Model-specific (extra param)
  "scheduler": "DPMSolverMultistep"  // Model-specific (extra param)
}

// Converted to Replicate prediction input
{
  "version": "...",
  "input": {
    "prompt": "A photo of an astronaut",
    "temperature": 0.7,
    "guidance_scale": 7.5,       // Flattened from extra_params
    "num_inference_steps": 50,   // Flattened from extra_params
    "scheduler": "DPMSolverMultistep"  // Flattened from extra_params
  }
}

Discovering Model Parameters

Each Replicate model has unique parameters. To find available parameters:

Model Page: Visit the model on replicate.com
OpenAPI Schema: Available at /v1/models/{owner}/{name}/versions/{version_id} (includes openapi_schema)
Cog Definition: Check the model’s source code (if public)

Caveats

System Prompt Field Support

Severity: Medium Behavior: Not all models support system_prompt field. For unsupported models, system prompt is prepended to conversation prompt. Impact: Prompt structure differs between models Models Affected: meta/meta-llama-3-8b, meta/llama-2-70b, openai/gpt-oss-20b, openai/o1-mini, xai/grok-4, and all deepseek-ai/deepseek* models Code: chat.go:300-318

Input Image Field Mapping

Severity: Medium Behavior: Different models expect input images in different fields (image_prompt, input_image, image, input_images) Impact: DeepIntShield automatically maps to correct field based on model Models Affected: Flux family models (see Input Image Field Mapping table) Code: images.go:192-209

Image Content in Chat

Severity: Low Behavior: Only non-base64 image URLs from message content blocks are extracted to image_input Impact: Base64-encoded images in messages are ignored Code: chat.go:58-63

Model-Specific Parameters

Severity: Medium Behavior: Each model has unique input schema; standard parameters may not work for all models Impact: Requires checking model documentation for available parameters Mitigation: Use extra_params for model-specific fields

Video Generation

Generate (`POST /v1/videos`)

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Replicate model (owner/model or version ID)
`prompt`	string	✅	Text description of the video
`input_reference`	string	❌	Reference image (base64 data URL or URL) → mapped to `image` field; OpenAI-hosted models use `input_reference`
`seconds`	string	❌	Duration → `duration`
`seed`	int	❌	Seed for reproducibility
`negative_prompt`	string	❌	What to avoid

Extra Params: Pass model-specific fields directly in the JSON body (unrecognized fields become extra_params and are flattened into the prediction input). webhook and webhook_events_filter are extracted automatically.

Response: DeepIntShieldVideoGenerationResponse — id, status, model, videos[]

Job Statuses: queued (starting) → in_progress (processing) → completed / failed

Retrieve / Download

Operation	Endpoint	Notes
Get status	`GET /v1/videos/{id}`	Maps to `/v1/predictions/{id}`
Download	`GET /v1/videos/{id}/content`	Downloads from the prediction output URL

Replicate

Overview

Key Architectural Differences

Supported Operations

Model Identification

1. Version ID

2. Model Name

3. Deployment

Prediction Modes

Sync Mode

Async Mode (Polling)

1. Chat Completions

Message Conversion

System Prompt Filtering

Model-Specific Parameters

Response Conversion

Field Mapping

Example Response

Streaming

2. Responses API

Response Format

3. Text Completions (Legacy)

Conversion

Example

Response

4. Image Generation

Parameter Mapping

Input Image Field Mapping

Example

Response Conversion

Streaming

5. Image Edit

Parameter Mapping

Field Mapping by Model

Example

Response

Streaming

6. Files API

Upload

List Files

Retrieve / Delete

File Content Download

7. List Models

Extra Parameters

Model-Specific Parameters

How It Works

Discovering Model Parameters

Caveats

Video Generation

Generate (POST /v1/videos)

Retrieve / Download

Reference Links

Generate (`POST /v1/videos`)