Skip to content

Replicate

Replicate is architecturally different from other providers in DeepIntShield. It uses a prediction-based API where every request creates a “prediction” that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge.

  1. Prediction-Based System: All operations create predictions via /v1/predictions or deployment endpoints
  2. Model-Specific Inputs: Each model has its own parameter schema (use extra_params for model-specific fields)
  3. Async/Sync Modes: Predictions can run synchronously (with Prefer: wait header) or asynchronously (with polling)
  4. Flexible Output: Output can be strings, arrays, URLs, or data URIs depending on the model
OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/predictions
Responses API/v1/predictions
Text Completions/v1/predictions
Image Generation/v1/predictions
Image Edit/v1/predictions
Video Generation-/v1/predictions
Image Variation-
Files-/v1/files
List Models-/v1/deployments
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Batch-

Replicate models can be specified in three ways:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"messages": [{"role": "user", "content": "Hello"}]
}'

Format: owner/model-name

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'

Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.

Configuration Example:

{
"provider": "replicate",
"value": "your-api-key",
"replicate_key_config": {
"deployments": {
"my-model": "owner/my-deployment-name"
}
}
}

Usage:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/my-model",
"messages": [{"role": "user", "content": "Hello"}]
}'

DeepIntShield uses sync mode with the Prefer: wait header if it is present in the request headers. The request blocks until the prediction completes or times out (default 60 seconds).

How it works:

  1. Creates prediction with Prefer: wait=60 header
  2. Replicate holds connection open for up to 60 seconds
  3. If prediction completes within timeout, returns result immediately
  4. If timeout expires, falls back to polling mode

It is the default mode of Replicate predictions. DeepIntShield automatically polls the prediction URL every 2 seconds until completion.

Status Flow: startingprocessingsucceeded/failed/canceled


System Messages: Extracted from messages array and concatenated into system_prompt field.

User/Assistant Messages: Preserved as conversation context. Text content from content blocks is concatenated with newlines.

Image Content: Non-base64 image URLs from message content blocks are extracted and passed as image_input array.

// Input
{
"messages": [
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"}
]
}
// Converted to Replicate format
{
"input": {
"system_prompt": "You are helpful",
"prompt": "Hello",
"messages": [...] // Original messages array also included
}
}

Important: Not all Replicate models support the system_prompt field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.

Models without system_prompt support:

  • meta/meta-llama-3-8b
  • meta/llama-2-70b
  • openai/gpt-oss-20b
  • openai/o1-mini
  • xai/grok-4
  • All deepseek-ai/deepseek* models (e.g., deepseek-r1, deepseek-v3)

Use extra_params to pass model-specific parameters. These are flattened into the input object:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7,
"top_k": 50,
"repetition_penalty": 1.1,
"min_new_tokens": 10
}'
  • Output:
    • String → choices[0].message.content
    • Array of strings → joined and mapped to choices[0].message.content
    • Object with text field → text value mapped to choices[0].message.content
  • Status: succeededfinish_reason: "stop", failedfinish_reason: "error"
  • Metrics: input_token_countprompt_tokens, output_token_countcompletion_tokens
{
"id": "abc123",
"model": "meta/llama-2-7b-chat",
"object": "chat.completion",
"created": 1234567890,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}

Replicate streaming uses Server-Sent Events (SSE) with the following event types:

Event TypeDescriptionData Format
outputContent chunkPlain text string
doneCompletionJSON: {"reason": ""} (empty = success)
errorError occurredJSON: {"detail": "error message"}

Streaming Flow:

  1. DeepIntShield sets stream: true in prediction input
  2. Replicate returns urls.stream in initial response
  3. DeepIntShield connects to stream URL and processes SSE events
  4. output events → content deltas
  5. done event → final chunk with finish_reason

Done Event Reasons:

  • Empty or no reason = success (finish_reason: "stop")
  • "canceled" = prediction was canceled
  • "error" = prediction failed

The Responses API is converted internally to Chat Completions or native Replicate format depending on the model:

// Responses request → Replicate prediction conversion
ResponsesRequestReplicatePredictionRequestReplicatePredictionResponseDeepIntShieldResponsesResponse

Conversion Logic:

  1. For OpenAI models with gpt-5-structured: Uses native Responses format with input_item_list, tools, and json_schema support
  2. For all other models: Converted to Chat Completions format using message conversion logic

Same parameter mapping and system prompt handling as Chat Completions.

Responses follow standard Responses API format with status mapping:

Replicate StatusResponses Status
succeededcompleted
failedfailed
canceledcancelled
processingin_progress
startingqueued

  • Prompt array: Joined with newlines into single prompt field
  • top_k: Pass via extra_params (model-specific)
Terminal window
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b",
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.8,
"top_k": 40
}'

Same conversion as chat completions: output string/array → choices[0].text, with usage metrics from prediction metrics.


{
"prompt": "prompt",
"n": "number_of_images",
"aspect_ratio": "aspect_ratio",
"resolution": "resolution",
"output_format": "output_format",
"quality": "quality",
"background": "background",
"seed": "seed",
"negative_prompt": "negative_prompt",
"num_inference_steps": "num_inference_steps",
"input_images": "input_images"
}

Important: Different Replicate models expect input images in different fields. DeepIntShield automatically maps input_images to the correct field based on the model.

Field Mapping by Model:

FieldModels
image_promptblack-forest-labs/flux-1.1-pro
black-forest-labs/flux-1.1-pro-ultra
black-forest-labs/flux-pro
black-forest-labs/flux-1.1-pro-ultra-finetuned
input_imageblack-forest-labs/flux-kontext-pro
black-forest-labs/flux-kontext-max
black-forest-labs/flux-kontext-dev
imageblack-forest-labs/flux-dev
black-forest-labs/flux-fill-pro
black-forest-labs/flux-dev-lora
black-forest-labs/flux-krea-dev
input_imagesAll other models (default)
Terminal window
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/black-forest-labs/flux-schnell",
"prompt": "A serene mountain landscape at sunset",
"aspect_ratio": "16:9",
"output_format": "webp",
"num_inference_steps": 4,
"seed": 42
}'

Replicate output can be:

  • Single URL: String → data[0].url
  • Multiple URLs: Array → data[i].url for each image
  • Data URIs: Base64-encoded images in data URI format
{
"id": "xyz789",
"created": 1234567890,
"model": "black-forest-labs/flux-schnell",
"data": [
{
"url": "https://replicate.delivery/pbxt/...",
"index": 0
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 0,
"total_tokens": 15
}
}

Image generation streaming provides progressive image updates as data URIs:

SSE Events:

  • output: Data URI chunk (partial image)
  • done: Final completion with reason
  • error: Error details

Flow:

  1. Each output event contains a complete data URI (e.g., data:image/webp;base64,...)
  2. Progressive refinement shows generation progress
  3. done event signals completion with final image
  4. Each chunk includes Index, ChunkIndex, and B64JSON fields

Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same input image field mapping as Image Generation applies (see Field Mapping by Model below).

Endpoint: /v1/images/edits (DeepIntShield) → Replicate /v1/predictions or deployment predictions.

DeepIntShield / RequestReplicate input
input.imagesMapped to image_prompt, input_image, image, or input_images by model
input.promptprompt
params.nnumber_of_images
params.output_formatoutput_format
params.qualityquality
params.backgroundbackground
params.seedseed
params.negative_promptnegative_prompt
params.num_inference_stepsnum_inference_steps
params.extra_paramsMerged into prediction input

Input images are mapped to the same fields as in Image Generation:

FieldModels
image_promptblack-forest-labs/flux-1.1-pro, black-forest-labs/flux-1.1-pro-ultra, black-forest-labs/flux-pro, black-forest-labs/flux-1.1-pro-ultra-finetuned
input_imageblack-forest-labs/flux-kontext-pro, black-forest-labs/flux-kontext-max, black-forest-labs/flux-kontext-dev
imageblack-forest-labs/flux-dev, black-forest-labs/flux-fill-pro, black-forest-labs/flux-dev-lora, black-forest-labs/flux-krea-dev
input_imagesAll other models (default)
Terminal window
curl -X POST 'http://localhost:8080/v1/images/edits' \
--form 'model="replicate/black-forest-labs/flux-fill-pro"' \
--form 'image[]=@"image.png"' \
--form 'prompt="Replace the sky with a starry night"' \
--form 'mask=@"mask.png"'

Same as Image Generation: single URL → data[0].url, array of URLs → data[i].url, or data URIs. Response shape is DeepIntShieldImageGenerationResponse with data[].url or data[].b64_json.

Image edit streaming is supported. Events use the same prediction log stream as image generation:

  • Partial chunks: type: "image_edit.partial_image" with b64_json (or data URI) until completion.
  • Completed: type: "image_edit.completed" with final image and usage.

Use Prefer: wait for sync behavior or rely on polling (async) like other Replicate predictions.


Replicate’s Files API supports uploading, listing, and managing files for use in predictions.

Request: Multipart form-data

FieldTypeRequiredNotes
filebinaryFile content
filenamestringCustom filename
content_typestringMIME type (auto-detected from extension)

Example:

Terminal window
curl -X POST http://localhost:8080/v1/files \
-H "Authorization: Bearer $API_KEY" \
-F "file=@document.pdf" \
-F "filename=my-document.pdf"

Response:

{
"id": "file_abc123",
"object": "file",
"bytes": 12345,
"created_at": 1234567890,
"filename": "my-document.pdf",
"purpose": "batch",
"status": "processed"
}

Query Parameters:

ParameterTypeNotes
limitintResults per page
afterstringPagination cursor

Example:

Terminal window
curl -X GET "http://localhost:8080/v1/files?limit=20" \
-H "Authorization: Bearer $API_KEY"

Pagination: Uses cursor-based pagination with next URL in response. DeepIntShield serializes this into the after cursor.

Operations:

  • GET /v1/files/{file_id} - Retrieve file metadata
  • DELETE /v1/files/{file_id} - Delete file

Required Parameters in ExtraParams:

ParameterTypeDescription
ownerstringFile owner username
expiryint64Unix timestamp for expiration
signaturestringBase64-encoded HMAC-SHA256 signature

Signature Format: HMAC-SHA256 of "{owner} {file_id} {expiry}" using Files API signing secret

Example:

Terminal window
curl -X POST http://localhost:8080/v1/files/file_abc123/content \
-H "Content-Type: application/json" \
-d '{
"owner": "my-username",
"expiry": 1735689600,
"signature": "base64-encoded-signature"
}'

Endpoint: /v1/models

Deployments are private or organization models with dedicated infrastructure. The response includes:

{
"data": [
{
"id": "replicate/my-org/my-deployment",
"name": "my-deployment",
"owner": "my-org"
}
],
"has_more": false
}

Usage:

  1. List your deployments via this endpoint
  2. Use deployment name as model identifier: replicate/my-org/my-deployment
  3. Predictions route to deployment-specific endpoint: /v1/deployments/my-org/my-deployment/predictions

The most important feature for Replicate integration is extra_params. Parameters not in DeepIntShield’s standard schema are flattened directly into the prediction input object.

// Request with extra params
{
"model": "replicate/stability-ai/sdxl",
"prompt": "A photo of an astronaut",
"temperature": 0.7, // Standard param
"guidance_scale": 7.5, // Model-specific (extra param)
"num_inference_steps": 50, // Model-specific (extra param)
"scheduler": "DPMSolverMultistep" // Model-specific (extra param)
}
// Converted to Replicate prediction input
{
"version": "...",
"input": {
"prompt": "A photo of an astronaut",
"temperature": 0.7,
"guidance_scale": 7.5, // Flattened from extra_params
"num_inference_steps": 50, // Flattened from extra_params
"scheduler": "DPMSolverMultistep" // Flattened from extra_params
}
}

Each Replicate model has unique parameters. To find available parameters:

  1. Model Page: Visit the model on replicate.com
  2. OpenAPI Schema: Available at /v1/models/{owner}/{name}/versions/{version_id} (includes openapi_schema)
  3. Cog Definition: Check the model’s source code (if public)

System Prompt Field Support

Severity: Medium Behavior: Not all models support system_prompt field. For unsupported models, system prompt is prepended to conversation prompt. Impact: Prompt structure differs between models Models Affected: meta/meta-llama-3-8b, meta/llama-2-70b, openai/gpt-oss-20b, openai/o1-mini, xai/grok-4, and all deepseek-ai/deepseek* models Code: chat.go:300-318

Input Image Field Mapping

Severity: Medium Behavior: Different models expect input images in different fields (image_prompt, input_image, image, input_images) Impact: DeepIntShield automatically maps to correct field based on model Models Affected: Flux family models (see Input Image Field Mapping table) Code: images.go:192-209

Image Content in Chat

Severity: Low Behavior: Only non-base64 image URLs from message content blocks are extracted to image_input Impact: Base64-encoded images in messages are ignored Code: chat.go:58-63

Model-Specific Parameters

Severity: Medium Behavior: Each model has unique input schema; standard parameters may not work for all models Impact: Requires checking model documentation for available parameters Mitigation: Use extra_params for model-specific fields


Request Parameters

ParameterTypeRequiredNotes
modelstringReplicate model (owner/model or version ID)
promptstringText description of the video
input_referencestringReference image (base64 data URL or URL) → mapped to image field; OpenAI-hosted models use input_reference
secondsstringDuration → duration
seedintSeed for reproducibility
negative_promptstringWhat to avoid

Extra Params: Pass model-specific fields directly in the JSON body (unrecognized fields become extra_params and are flattened into the prediction input). webhook and webhook_events_filter are extracted automatically.

Response: DeepIntShieldVideoGenerationResponseid, status, model, videos[]

Job Statuses: queued (starting) → in_progress (processing) → completed / failed

OperationEndpointNotes
Get statusGET /v1/videos/{id}Maps to /v1/predictions/{id}
DownloadGET /v1/videos/{id}/contentDownloads from the prediction output URL