Skip to content

Cohere

Cohere has a different API structure from OpenAI’s format. DeepIntShield performs conversions including:

  • Parameter renaming - e.g., max_completion_tokensmax_tokens, top_pp, stopstop_sequences
  • Message content conversion - String and content block formats handled
  • Tool conversion - Tool definitions and tool choice mapped to Cohere format
  • Thinking/Reasoning transformation - reasoning parameters mapped to Cohere’s thinking structure
  • Response format conversion - JSON schema handling adapted to Cohere’s format
OperationNon-StreamingStreamingEndpoint
Chat Completions/v2/chat
Responses API/v2/chat
Embeddings-/v2/embed
List Models-/v1/models
Text Completions-
Image Generation-
Speech (TTS)-
Transcriptions (STT)-
Files-
Batch-

ParameterTransformation
max_completion_tokensRenamed to max_tokens
temperature, top_ppDirect pass-through for temperature; top_p renamed to p
stopRenamed to stop_sequences
frequency_penalty, presence_penaltyDirect pass-through
response_formatConverted to structured format (see Response Format)
toolsSchema structure adapted (see Tool Conversion)
tool_choiceType mapped (see Tool Conversion)
reasoningMapped to thinking (see Reasoning / Thinking)
userVia extra_params (not directly supported in Cohere v2 API)
top_kVia extra_params (Cohere-specific)

The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier

Use extra_params (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:

Terminal window
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40,
"safety_mode": "STRICT",
"log_probs": true,
"strict_tool_choice": false
}'

Documentation: See DeepIntShield Reasoning Reference

  • reasoning.effortthinking.type (mapped to "enabled" or "disabled")
  • reasoning.max_tokensthinking.token_budget (token budget for thinking)
  • Minimum budget: 1 token required; requests with 0 tokens will be converted to disabled
  • Dynamic budget: -1 is converted to 1 automatically
// Request
{"reasoning": {"effort": "high", "max_tokens": 2048}}
// Cohere conversion
{"thinking": {"type": "enabled", "token_budget": 2048}}
  • String content: Messages can have simple string content
  • Content blocks: Messages can have arrays of content blocks (text, images, thinking)
  • Image conversion: image_url blocks with URL are supported
  • Tool calls: Converted from message assistant tool calls to Cohere format
  • Tool messages: Tool call results are passed with tool_call_id

Tool definitions are adapted to Cohere format with the following mappings:

  • Function namename (unchanged)
  • Function parametersparameters (flexible JSON format)
  • Strict mode (strict: true) is silently dropped (not supported)

Tool choice mapping:

  • "none""NONE"
  • "auto" or "required""REQUIRED" or "AUTO"
  • Specific tool selection → "REQUIRED" (Cohere uses function-level selection)

Supported formats:

  • text - Plain text response
  • json_object - Structured JSON response
  • json_schema - JSON with schema validation (converted to json_object)

Schema is passed through response_format.json_schema field.

  • finish_reason: COMPLETE / STOP_SEQUENCEstop, MAX_TOKENSlength, TOOL_CALLtool_calls
  • input_tokensprompt_tokens | output_tokenscompletion_tokens
  • cached_tokensprompt_tokens_details.cached_tokens (if present)
  • Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)

Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end

Delta types:

  • content-delta with text → message content
  • content-delta with thinking → reasoning text
  • tool-call-start/delta/end → tool call events
  • tool-plan-delta → tool planning output

Minimum Thinking Budget

Severity: Low Behavior: reasoning.max_tokens must be >= 1 Impact: Very low impact, conversion happens automatically Code: chat.go:104-130

Top P Renamed

Severity: Low Behavior: top_p parameter renamed to p Impact: Parameter name changes internally Code: chat.go:99

Strict Tool Mode Dropped

Severity: Low Behavior: strict: true in tool definitions silently dropped Impact: No schema validation enforcement Code: chat.go:168-185

Tool Arguments Format

Severity: Low Behavior: Tool arguments are already strings, no JSON serialization needed Impact: Minimal - Cohere v2 API expects string format Code: chat.go:70-78


The Responses API uses the same underlying /v2/chat endpoint but converts between OpenAI’s Responses format and Cohere’s format.

ParameterTransformation
max_output_tokensRenamed to max_tokens
temperature, top_ppDirect pass-through for temperature; top_p renamed to p
instructionsBecomes system message
text.formatConverted to response_format
toolsSchema restructured (see Chat Completions)
tool_choiceType mapped (see Chat Completions)
reasoningMapped to thinking (see Reasoning / Thinking)
stopVia extra_params, renamed to stop_sequences
top_kVia extra_params (Cohere-specific)
frequency_penalty, presence_penaltyVia extra_params

Use extra_params (SDK) or pass directly in request body (Gateway):

Terminal window
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"input": "Hello, how are you?",
"top_k": 40,
"stop": [".", "!"]
}'
  • Input: String converted to user message or array converted to messages
  • Instructions: Becomes system message (prepended to messages)

Supported types: function

Tool conversions same as Chat Completions.

  • textmessage | tool_usefunction_call
  • input_tokens / output_tokens preserved
  • Token details with cached tokens support

Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end

Special handling:

  • Tool call arguments accumulated across chunks
  • Synthetic output_item.added events emitted for text/reasoning
  • Stable item IDs generated as msg_{messageID}_item_{outputIndex}

ParameterTransformation
input (text or array)Converted to texts array
dimensionsRenamed to output_dimension
input_typeVia extra_params (required, defaults to "search_document")
embedding_typesVia extra_params (array of embedding types)
truncateVia extra_params (how to handle long inputs)
max_tokensVia extra_params (max tokens to embed per input)

Use extra_params for Cohere-specific embedding options:

Terminal window
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/embed-english-v3.0",
"input": ["text to embed"],
"input_type": "search_query",
"embedding_types": ["float"],
"truncate": "START"
}'
  • Input Type Required: Cohere v3+ models require input_type parameter (defaults to "search_document")
  • Embedding Types: Specify which embedding types to return (e.g., "float", "int8")
  • embeddings.floatdata[].embedding
  • meta.tokens → usage information
  • Multiple embedding types handled

Request: GET /v1/models?page_size={defaultPageSize}

Field mapping: Model data converted to standard format

Pagination: Cursor-based with next_page_token

Note: endpoint and default_only filters available via extra_params