Replicate
Overview
Section titled “Overview”Replicate is architecturally different from other providers in DeepIntShield. It uses a prediction-based API where every request creates a “prediction” that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge.
Key Architectural Differences
Section titled “Key Architectural Differences”- Prediction-Based System: All operations create predictions via
/v1/predictionsor deployment endpoints - Model-Specific Inputs: Each model has its own parameter schema (use
extra_paramsfor model-specific fields) - Async/Sync Modes: Predictions can run synchronously (with
Prefer: waitheader) or asynchronously (with polling) - Flexible Output: Output can be strings, arrays, URLs, or data URIs depending on the model
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/predictions |
| Responses API | ✅ | ✅ | /v1/predictions |
| Text Completions | ✅ | ✅ | /v1/predictions |
| Image Generation | ✅ | ✅ | /v1/predictions |
| Image Edit | ✅ | ✅ | /v1/predictions |
| Video Generation | ✅ | - | /v1/predictions |
| Image Variation | ❌ | ❌ | - |
| Files | ✅ | - | /v1/files |
| List Models | ✅ | - | /v1/deployments |
| Embeddings | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
Model Identification
Section titled “Model Identification”Replicate models can be specified in three ways:
1. Version ID
Section titled “1. Version ID”curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "messages": [{"role": "user", "content": "Hello"}] }'2. Model Name
Section titled “2. Model Name”Format: owner/model-name
curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/meta/llama-2-7b-chat", "messages": [{"role": "user", "content": "Hello"}] }'3. Deployment
Section titled “3. Deployment”Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.
Configuration Example:
{ "provider": "replicate", "value": "your-api-key", "replicate_key_config": { "deployments": { "my-model": "owner/my-deployment-name" } }}Usage:
curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/my-model", "messages": [{"role": "user", "content": "Hello"}] }'Prediction Modes
Section titled “Prediction Modes”Sync Mode
Section titled “Sync Mode”DeepIntShield uses sync mode with the Prefer: wait header if it is present in the request headers. The request blocks until the prediction completes or times out (default 60 seconds).
How it works:
- Creates prediction with
Prefer: wait=60header - Replicate holds connection open for up to 60 seconds
- If prediction completes within timeout, returns result immediately
- If timeout expires, falls back to polling mode
Async Mode (Polling)
Section titled “Async Mode (Polling)”It is the default mode of Replicate predictions. DeepIntShield automatically polls the prediction URL every 2 seconds until completion.
Status Flow: starting → processing → succeeded/failed/canceled
1. Chat Completions
Section titled “1. Chat Completions”Message Conversion
Section titled “Message Conversion”System Messages: Extracted from messages array and concatenated into system_prompt field.
User/Assistant Messages: Preserved as conversation context. Text content from content blocks is concatenated with newlines.
Image Content: Non-base64 image URLs from message content blocks are extracted and passed as image_input array.
// Input{ "messages": [ {"role": "system", "content": "You are helpful"}, {"role": "user", "content": "Hello"} ]}
// Converted to Replicate format{ "input": { "system_prompt": "You are helpful", "prompt": "Hello", "messages": [...] // Original messages array also included }}System Prompt Filtering
Section titled “System Prompt Filtering”Important: Not all Replicate models support the system_prompt field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.
Models without system_prompt support:
meta/meta-llama-3-8bmeta/llama-2-70bopenai/gpt-oss-20bopenai/o1-minixai/grok-4- All
deepseek-ai/deepseek*models (e.g.,deepseek-r1,deepseek-v3)
Model-Specific Parameters
Section titled “Model-Specific Parameters”Use extra_params to pass model-specific parameters. These are flattened into the input object:
curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/meta/llama-2-7b-chat", "messages": [{"role": "user", "content": "Hello"}], "temperature": 0.7, "top_k": 50, "repetition_penalty": 1.1, "min_new_tokens": 10 }'resp, err := client.ChatCompletionRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldChatRequest{ Provider: schemas.Replicate, Model: "meta/llama-2-7b-chat", Input: messages, Params: &schemas.ChatParameters{ Temperature: schemas.Ptr(0.7), ExtraParams: map[string]interface{}{ "top_k": 50, "repetition_penalty": 1.1, "min_new_tokens": 10, }, },})Response Conversion
Section titled “Response Conversion”Field Mapping
Section titled “Field Mapping”- Output:
- String →
choices[0].message.content - Array of strings → joined and mapped to
choices[0].message.content - Object with
textfield →textvalue mapped tochoices[0].message.content
- String →
- Status:
succeeded→finish_reason: "stop",failed→finish_reason: "error" - Metrics:
input_token_count→prompt_tokens,output_token_count→completion_tokens
Example Response
Section titled “Example Response”{ "id": "abc123", "model": "meta/llama-2-7b-chat", "object": "chat.completion", "created": 1234567890, "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 8, "total_tokens": 18 }}Streaming
Section titled “Streaming”Replicate streaming uses Server-Sent Events (SSE) with the following event types:
| Event Type | Description | Data Format |
|---|---|---|
output | Content chunk | Plain text string |
done | Completion | JSON: {"reason": ""} (empty = success) |
error | Error occurred | JSON: {"detail": "error message"} |
Streaming Flow:
- DeepIntShield sets
stream: truein prediction input - Replicate returns
urls.streamin initial response - DeepIntShield connects to stream URL and processes SSE events
outputevents → content deltasdoneevent → final chunk withfinish_reason
Done Event Reasons:
- Empty or no reason = success (
finish_reason: "stop") "canceled"= prediction was canceled"error"= prediction failed
2. Responses API
Section titled “2. Responses API”The Responses API is converted internally to Chat Completions or native Replicate format depending on the model:
// Responses request → Replicate prediction conversionResponsesRequest → ReplicatePredictionRequest → ReplicatePredictionResponse → DeepIntShieldResponsesResponseConversion Logic:
- For OpenAI models with
gpt-5-structured: Uses native Responses format withinput_item_list,tools, andjson_schemasupport - For all other models: Converted to Chat Completions format using message conversion logic
Same parameter mapping and system prompt handling as Chat Completions.
Response Format
Section titled “Response Format”Responses follow standard Responses API format with status mapping:
| Replicate Status | Responses Status |
|---|---|
succeeded | completed |
failed | failed |
canceled | cancelled |
processing | in_progress |
starting | queued |
3. Text Completions (Legacy)
Section titled “3. Text Completions (Legacy)”Conversion
Section titled “Conversion”- Prompt array: Joined with newlines into single
promptfield - top_k: Pass via
extra_params(model-specific)
Example
Section titled “Example”curl -X POST http://localhost:8080/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/meta/llama-2-7b", "prompt": "Once upon a time", "max_tokens": 100, "temperature": 0.8, "top_k": 40 }'Response
Section titled “Response”Same conversion as chat completions: output string/array → choices[0].text, with usage metrics from prediction metrics.
4. Image Generation
Section titled “4. Image Generation”Parameter Mapping
Section titled “Parameter Mapping”{ "prompt": "prompt", "n": "number_of_images", "aspect_ratio": "aspect_ratio", "resolution": "resolution", "output_format": "output_format", "quality": "quality", "background": "background", "seed": "seed", "negative_prompt": "negative_prompt", "num_inference_steps": "num_inference_steps", "input_images": "input_images"}Input Image Field Mapping
Section titled “Input Image Field Mapping”Important: Different Replicate models expect input images in different fields. DeepIntShield automatically maps input_images to the correct field based on the model.
Field Mapping by Model:
| Field | Models |
|---|---|
image_prompt | black-forest-labs/flux-1.1-problack-forest-labs/flux-1.1-pro-ultrablack-forest-labs/flux-problack-forest-labs/flux-1.1-pro-ultra-finetuned |
input_image | black-forest-labs/flux-kontext-problack-forest-labs/flux-kontext-maxblack-forest-labs/flux-kontext-dev |
image | black-forest-labs/flux-devblack-forest-labs/flux-fill-problack-forest-labs/flux-dev-lorablack-forest-labs/flux-krea-dev |
input_images | All other models (default) |
Example
Section titled “Example”curl -X POST http://localhost:8080/v1/images/generations \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/black-forest-labs/flux-schnell", "prompt": "A serene mountain landscape at sunset", "aspect_ratio": "16:9", "output_format": "webp", "num_inference_steps": 4, "seed": 42 }'resp, err := client.ImageGenerationRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldImageGenerationRequest{ Provider: schemas.Replicate, Model: "black-forest-labs/flux-schnell", Input: &schemas.ImageGenerationInput{ Prompt: "A serene mountain landscape at sunset", }, Params: &schemas.ImageGenerationParameters{ AspectRatio: schemas.Ptr("16:9"), OutputFormat: schemas.Ptr("webp"), NumInferenceSteps: schemas.Ptr(4), Seed: schemas.Ptr(42), },})Response Conversion
Section titled “Response Conversion”Replicate output can be:
- Single URL: String →
data[0].url - Multiple URLs: Array →
data[i].urlfor each image - Data URIs: Base64-encoded images in data URI format
{ "id": "xyz789", "created": 1234567890, "model": "black-forest-labs/flux-schnell", "data": [ { "url": "https://replicate.delivery/pbxt/...", "index": 0 } ], "usage": { "input_tokens": 15, "output_tokens": 0, "total_tokens": 15 }}Streaming
Section titled “Streaming”Image generation streaming provides progressive image updates as data URIs:
SSE Events:
output: Data URI chunk (partial image)done: Final completion with reasonerror: Error details
Flow:
- Each
outputevent contains a complete data URI (e.g.,data:image/webp;base64,...) - Progressive refinement shows generation progress
doneevent signals completion with final image- Each chunk includes
Index,ChunkIndex, andB64JSONfields
5. Image Edit
Section titled “5. Image Edit”Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same input image field mapping as Image Generation applies (see Field Mapping by Model below).
Endpoint: /v1/images/edits (DeepIntShield) → Replicate /v1/predictions or deployment predictions.
Parameter Mapping
Section titled “Parameter Mapping”| DeepIntShield / Request | Replicate input |
|---|---|
input.images | Mapped to image_prompt, input_image, image, or input_images by model |
input.prompt | prompt |
params.n | number_of_images |
params.output_format | output_format |
params.quality | quality |
params.background | background |
params.seed | seed |
params.negative_prompt | negative_prompt |
params.num_inference_steps | num_inference_steps |
params.extra_params | Merged into prediction input |
Field Mapping by Model
Section titled “Field Mapping by Model”Input images are mapped to the same fields as in Image Generation:
| Field | Models |
|---|---|
image_prompt | black-forest-labs/flux-1.1-pro, black-forest-labs/flux-1.1-pro-ultra, black-forest-labs/flux-pro, black-forest-labs/flux-1.1-pro-ultra-finetuned |
input_image | black-forest-labs/flux-kontext-pro, black-forest-labs/flux-kontext-max, black-forest-labs/flux-kontext-dev |
image | black-forest-labs/flux-dev, black-forest-labs/flux-fill-pro, black-forest-labs/flux-dev-lora, black-forest-labs/flux-krea-dev |
input_images | All other models (default) |
Example
Section titled “Example”curl -X POST 'http://localhost:8080/v1/images/edits' \--form 'model="replicate/black-forest-labs/flux-fill-pro"' \--form 'image[]=@"image.png"' \--form 'prompt="Replace the sky with a starry night"' \--form 'mask=@"mask.png"'resp, err := client.ImageEditRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldImageEditRequest{ Provider: schemas.Replicate, Model: "black-forest-labs/flux-fill-pro", Input: &schemas.ImageEditInput{ Prompt: "Replace the sky with a starry night", Images: []schemas.ImageInput{ { Image: imageBytes }, }, },})Response
Section titled “Response”Same as Image Generation: single URL → data[0].url, array of URLs → data[i].url, or data URIs. Response shape is DeepIntShieldImageGenerationResponse with data[].url or data[].b64_json.
Streaming
Section titled “Streaming”Image edit streaming is supported. Events use the same prediction log stream as image generation:
- Partial chunks:
type: "image_edit.partial_image"withb64_json(or data URI) until completion. - Completed:
type: "image_edit.completed"with final image and usage.
Use Prefer: wait for sync behavior or rely on polling (async) like other Replicate predictions.
6. Files API
Section titled “6. Files API”Replicate’s Files API supports uploading, listing, and managing files for use in predictions.
Upload
Section titled “Upload”Request: Multipart form-data
| Field | Type | Required | Notes |
|---|---|---|---|
file | binary | ✅ | File content |
filename | string | ❌ | Custom filename |
content_type | string | ❌ | MIME type (auto-detected from extension) |
Example:
curl -X POST http://localhost:8080/v1/files \ -H "Authorization: Bearer $API_KEY" \ -F "file=@document.pdf" \ -F "filename=my-document.pdf"Response:
{ "id": "file_abc123", "object": "file", "bytes": 12345, "created_at": 1234567890, "filename": "my-document.pdf", "purpose": "batch", "status": "processed"}List Files
Section titled “List Files”Query Parameters:
| Parameter | Type | Notes |
|---|---|---|
limit | int | Results per page |
after | string | Pagination cursor |
Example:
curl -X GET "http://localhost:8080/v1/files?limit=20" \ -H "Authorization: Bearer $API_KEY"Pagination: Uses cursor-based pagination with next URL in response. DeepIntShield serializes this into the after cursor.
Retrieve / Delete
Section titled “Retrieve / Delete”Operations:
- GET
/v1/files/{file_id}- Retrieve file metadata - DELETE
/v1/files/{file_id}- Delete file
File Content Download
Section titled “File Content Download”Required Parameters in ExtraParams:
| Parameter | Type | Description |
|---|---|---|
owner | string | File owner username |
expiry | int64 | Unix timestamp for expiration |
signature | string | Base64-encoded HMAC-SHA256 signature |
Signature Format: HMAC-SHA256 of "{owner} {file_id} {expiry}" using Files API signing secret
Example:
curl -X POST http://localhost:8080/v1/files/file_abc123/content \ -H "Content-Type: application/json" \ -d '{ "owner": "my-username", "expiry": 1735689600, "signature": "base64-encoded-signature" }'7. List Models
Section titled “7. List Models”Endpoint: /v1/models
Deployments are private or organization models with dedicated infrastructure. The response includes:
{ "data": [ { "id": "replicate/my-org/my-deployment", "name": "my-deployment", "owner": "my-org" } ], "has_more": false}Usage:
- List your deployments via this endpoint
- Use deployment name as model identifier:
replicate/my-org/my-deployment - Predictions route to deployment-specific endpoint:
/v1/deployments/my-org/my-deployment/predictions
Extra Parameters
Section titled “Extra Parameters”Model-Specific Parameters
Section titled “Model-Specific Parameters”The most important feature for Replicate integration is extra_params. Parameters not in DeepIntShield’s standard schema are flattened directly into the prediction input object.
How It Works
Section titled “How It Works”// Request with extra params{ "model": "replicate/stability-ai/sdxl", "prompt": "A photo of an astronaut", "temperature": 0.7, // Standard param "guidance_scale": 7.5, // Model-specific (extra param) "num_inference_steps": 50, // Model-specific (extra param) "scheduler": "DPMSolverMultistep" // Model-specific (extra param)}
// Converted to Replicate prediction input{ "version": "...", "input": { "prompt": "A photo of an astronaut", "temperature": 0.7, "guidance_scale": 7.5, // Flattened from extra_params "num_inference_steps": 50, // Flattened from extra_params "scheduler": "DPMSolverMultistep" // Flattened from extra_params }}Discovering Model Parameters
Section titled “Discovering Model Parameters”Each Replicate model has unique parameters. To find available parameters:
- Model Page: Visit the model on replicate.com
- OpenAPI Schema: Available at
/v1/models/{owner}/{name}/versions/{version_id}(includesopenapi_schema) - Cog Definition: Check the model’s source code (if public)
Caveats
Section titled “Caveats”System Prompt Field Support
Severity: Medium
Behavior: Not all models support system_prompt field. For unsupported models, system prompt is prepended to conversation prompt.
Impact: Prompt structure differs between models
Models Affected: meta/meta-llama-3-8b, meta/llama-2-70b, openai/gpt-oss-20b, openai/o1-mini, xai/grok-4, and all deepseek-ai/deepseek* models
Code: chat.go:300-318
Input Image Field Mapping
Severity: Medium
Behavior: Different models expect input images in different fields (image_prompt, input_image, image, input_images)
Impact: DeepIntShield automatically maps to correct field based on model
Models Affected: Flux family models (see Input Image Field Mapping table)
Code: images.go:192-209
Image Content in Chat
Severity: Low
Behavior: Only non-base64 image URLs from message content blocks are extracted to image_input
Impact: Base64-encoded images in messages are ignored
Code: chat.go:58-63
Model-Specific Parameters
Severity: Medium
Behavior: Each model has unique input schema; standard parameters may not work for all models
Impact: Requires checking model documentation for available parameters
Mitigation: Use extra_params for model-specific fields
Video Generation
Section titled “Video Generation”Generate (POST /v1/videos)
Section titled “Generate (POST /v1/videos)”Request Parameters
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | ✅ | Replicate model (owner/model or version ID) |
prompt | string | ✅ | Text description of the video |
input_reference | string | ❌ | Reference image (base64 data URL or URL) → mapped to image field; OpenAI-hosted models use input_reference |
seconds | string | ❌ | Duration → duration |
seed | int | ❌ | Seed for reproducibility |
negative_prompt | string | ❌ | What to avoid |
Extra Params: Pass model-specific fields directly in the JSON body (unrecognized fields become extra_params and are flattened into the prediction input). webhook and webhook_events_filter are extracted automatically.
Response: DeepIntShieldVideoGenerationResponse — id, status, model, videos[]
Job Statuses: queued (starting) → in_progress (processing) → completed / failed
Retrieve / Download
Section titled “Retrieve / Download”| Operation | Endpoint | Notes |
|---|---|---|
| Get status | GET /v1/videos/{id} | Maps to /v1/predictions/{id} |
| Download | GET /v1/videos/{id}/content | Downloads from the prediction output URL |