Skip to content

Anthropic

Anthropic has significant structural differences from OpenAI’s format. DeepIntShield performs extensive conversion including:

  • System message extraction - Removed from messages array, placed in separate system field
  • Tool message grouping - Consecutive tool messages merged into single user message
  • Thinking block transformation - reasoning parameters mapped to Anthropic’s thinking structure
  • Parameter renaming - e.g., max_completion_tokensmax_tokens, stopstop_sequences
  • Content format conversion - Images, files, and other content types adapted to Anthropic’s schema
OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/messages
Responses API/v1/messages
Text Completions/v1/complete
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Image Generation-
Files-/v1/files
Batch-/v1/messages/batches
List Models-/v1/models

ParameterTransformation
max_completion_tokensRenamed to max_tokens
temperature, top_pDirect pass-through
stopRenamed to stop_sequences
response_formatConverted to output_format
toolsSchema restructured (see Tool Conversion)
tool_choiceType mapped (see Tool Conversion)
reasoningMapped to thinking (see Reasoning / Thinking)
userWrapped in metadata.user_id
top_kVia extra_params (Anthropic-specific)

The following parameters are silently ignored: frequency_penalty, presence_penalty, logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier

Use extra_params (SDK) or pass directly in request body (Gateway) for Anthropic-specific fields:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40
}'

Anthropic also accepts a top-level "cache_control": {"type": "ephemeral"} object on /anthropic/v1/messages requests to enable automatic prompt caching, and DeepIntShield now forwards that directive through unchanged.

Cache directives can be added to system messages, user messages, and tool definitions to enable prompt caching:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "This is cached context",
"cache_control": {"type": "ephemeral"}
}
]
}
],
"system": [
{
"type": "text",
"text": "You are a helpful assistant",
"cache_control": {"type": "ephemeral"}
}
]
}'

Documentation: See DeepIntShield Reasoning Reference

  • reasoning.effortthinking.type (always mapped to "enabled")
  • reasoning.max_tokensthinking.budget_tokens (token budget for thinking)
  • Minimum budget: 1024 tokens required; requests below this fail with error
  • Dynamic budget: -1 is converted to 1024 automatically
// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}
// Anthropic conversion
{"thinking": {"type": "enabled", "budget_tokens": 2048}}
  • System message extraction: System messages are removed from messages array and placed in separate system field. Multiple system messages become separate text blocks in the system array.
  • Tool message grouping: Consecutive tool messages are merged into single user message with tool_result content blocks.
  • URL images: {"type": "image_url", "image_url": {}}{"type": "image", "source": {"type": "url", ...}}
  • Base64 images: Data URL → {"type": "image", "source": {"type": "base64", "media_type": "image/png", ...}}

Cache directives supported on: system content blocks, user message content blocks, tool definitions (see Cache Control examples above)

Tool definitions are restructured: function.namename, function.parametersinput_schema, function.strict is dropped.

Tool choice mapping: "auto"auto | "none"none | "required"any | Specific tool → {"type": "tool", "name": "X"}

  • stop_reasonfinish_reason: end_turn/stop_sequencestop, max_tokenslength, tool_usetool_calls
  • input_tokens + cache_read_input_tokens + cache_creation_input_tokensprompt_tokens (all cache counts rolled into the total)
  • Cache token breakdown surfaced in prompt_tokens_details:
    • cache_read_input_tokensprompt_tokens_details.cached_read_tokens
    • cache_creation_input_tokensprompt_tokens_details.cached_write_tokens
  • output_tokenscompletion_tokens
  • thinking blocks → reasoning_details with index, type, text, and signature fields
  • Tool call arguments converted from JSON object → JSON string

Event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop

Delta types: text_delta → content | input_json_delta → tool arguments | thinking_delta → reasoning text | signature_delta → reasoning signature


System Message Extraction

Severity: High Behavior: System messages removed from array, placed in separate system field Impact: Message array structure differs from input Code: chat.go:145-167

Tool Message Grouping

Severity: High Behavior: Consecutive tool messages merged into single user message Impact: Message count and structure changes Code: chat.go:169-216

Minimum Reasoning Budget

Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Code: chat.go:113-115

Dynamic Budget Conversion

Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not supported Code: chat.go:107-111

Strict Tool Mode Dropped

Severity: Medium Behavior: strict: true in tool definitions silently dropped Impact: No schema validation enforcement Code: chat.go:43-72

Arguments Serialization

Severity: Low Behavior: Tool call input (object) serialized to arguments (JSON string) Code: chat.go:341-350


The Responses API uses the same underlying /v1/messages endpoint but converts between OpenAI’s Responses format and Anthropic’s Messages format.

ParameterTransformation
max_output_tokensRenamed to max_tokens
temperature, top_pDirect pass-through
instructionsBecomes system message
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinking (see Reasoning / Thinking)
userWrapped in metadata.user_id
textConverted to output_format
includeVia extra_params (Anthropic-specific)
stopVia extra_params, renamed to stop_sequences
top_kVia extra_params (Anthropic-specific)
truncationAuto-set to "auto" for computer tools

Use extra_params (SDK) or pass directly in request body (Gateway):

Terminal window
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"input": "Hello, how are you?",
"top_k": 40
}'

Cache directives can be added to instructions (system) and input messages to enable prompt caching:

Terminal window
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"instructions": "You are a helpful assistant. This instruction is cached.",
"instructions_cache_control": {"type": "ephemeral"},
"input": [
{
"type": "text",
"text": "Answer this question",
"cache_control": {"type": "ephemeral"}
}
]
}'
  • Input: String wrapped as user message or array converted to messages
  • Instructions: Becomes system message (same extraction as Chat Completions)

Supported types: function, computer_use_preview, web_search, mcp

Tool conversions same as Chat Completions with: MCP tools mapped to mcp_servers (server_label → name, server_url → url) and computer tools auto-set with truncation: "auto"

Cache control supported on instructions and input blocks (see Cache Control examples)

  • stop_reasonstatus: end_turn/stop_sequencecompleted, max_tokensincomplete
  • Top-level input_tokens and output_tokens are rollups that include cache-related usage; they map as input_tokensinput_tokens | output_tokensoutput_tokens.
  • Cache-specific counts are exposed in details: cache_read_input_tokensinput_tokens_details.cached_read_tokens | cache_creation_input_tokensinput_tokens_details.cached_write_tokens
  • Output items: textmessage | tool_usefunction_call | thinkingreasoning

Event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop

Special handling: Computer tool arguments accumulated across chunks (emitted on content_block_stop), synthetic content_part.added events emitted for text/reasoning, MCP calls use mcp_call_arguments_delta, item IDs generated as msg_{messageID}_item_{outputIndex}


Request: prompt auto-wrapped with \n\nHuman: {prompt}\n\nAssistant: | max_tokensmax_tokens_to_sample | temperature, top_p direct pass-through | top_k, stop via extra_params (→ stop_sequences)

Response: completionchoices[0].text | stop_reasonfinish_reason


Request formats: requests array (CustomID + Params) or input_file_id

Pagination: Cursor-based with after_id, before_id, limit

Endpoints:

  • POST /v1/messages/batches - Create
  • GET /v1/messages/batches - List
  • GET /v1/messages/batches/{batch_id} - Retrieve
  • POST /v1/messages/batches/{batch_id}/cancel - Cancel

Response: JSONL format with {custom_id, result: {type, message}}

Status mapping: in_progressInProgress, cancelingCancelling, endedEnded

Note: RFC3339Nano timestamps converted to Unix, multi-key retry supported


Upload: Multipart/form-data with file (required) and filename (optional)

Field mapping: id | filename | size_bytesbytes | created_at (Unix) | mime_typecontent_type

Endpoints: POST /v1/files, GET /v1/files (cursor pagination), GET /v1/files/{file_id}, DELETE /v1/files/{file_id}, GET /v1/files/{file_id}/content

Note: File purpose always "batch", status always "processed"


Request: GET /v1/models?limit={defaultPageSize} (no body)

Field mapping: id (prefixed anthropic/) | display_namename | created_at (Unix timestamp)

Pagination: Token-based with NextPageToken, FirstID, LastID

Multi-key support: Results aggregated from all keys, filtered by allowed_models if configured