Cohere
Overview
Section titled “Overview”Cohere has a different API structure from OpenAI’s format. DeepIntShield performs conversions including:
- Parameter renaming - e.g.,
max_completion_tokens→max_tokens,top_p→p,stop→stop_sequences - Message content conversion - String and content block formats handled
- Tool conversion - Tool definitions and tool choice mapped to Cohere format
- Thinking/Reasoning transformation -
reasoningparameters mapped to Cohere’sthinkingstructure - Response format conversion - JSON schema handling adapted to Cohere’s format
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v2/chat |
| Responses API | ✅ | ✅ | /v2/chat |
| Embeddings | ✅ | - | /v2/embed |
| List Models | ✅ | - | /v1/models |
| Text Completions | ❌ | ❌ | - |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
1. Chat Completions
Section titled “1. Chat Completions”Request Parameters
Section titled “Request Parameters”Parameter Mapping
Section titled “Parameter Mapping”| Parameter | Transformation |
|---|---|
max_completion_tokens | Renamed to max_tokens |
temperature, top_p → p | Direct pass-through for temperature; top_p renamed to p |
stop | Renamed to stop_sequences |
frequency_penalty, presence_penalty | Direct pass-through |
response_format | Converted to structured format (see Response Format) |
tools | Schema structure adapted (see Tool Conversion) |
tool_choice | Type mapped (see Tool Conversion) |
reasoning | Mapped to thinking (see Reasoning / Thinking) |
user | Via extra_params (not directly supported in Cohere v2 API) |
top_k | Via extra_params (Cohere-specific) |
Dropped Parameters
Section titled “Dropped Parameters”The following parameters are silently ignored: logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier
Extra Parameters
Section titled “Extra Parameters”Use extra_params (SDK) or pass directly in request body (Gateway) for Cohere-specific fields:
curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "cohere/command-r-plus", "messages": [{"role": "user", "content": "Hello"}], "top_k": 40, "safety_mode": "STRICT", "log_probs": true, "strict_tool_choice": false }'resp, err := client.ChatCompletionRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldChatRequest{ Provider: schemas.Cohere, Model: "cohere/command-r-plus", Input: messages, Params: &schemas.ChatParameters{ ExtraParams: map[string]interface{}{ "top_k": 40, "safety_mode": "STRICT", "log_probs": true, "strict_tool_choice": false, }, },})Reasoning / Thinking
Section titled “Reasoning / Thinking”Documentation: See DeepIntShield Reasoning Reference
Parameter Mapping
Section titled “Parameter Mapping”reasoning.effort→thinking.type(mapped to"enabled"or"disabled")reasoning.max_tokens→thinking.token_budget(token budget for thinking)
Critical Constraints
Section titled “Critical Constraints”- Minimum budget: 1 token required; requests with 0 tokens will be converted to disabled
- Dynamic budget:
-1is converted to1automatically
Example
Section titled “Example”// Request{"reasoning": {"effort": "high", "max_tokens": 2048}}
// Cohere conversion{"thinking": {"type": "enabled", "token_budget": 2048}}Message Conversion
Section titled “Message Conversion”Content Handling
Section titled “Content Handling”- String content: Messages can have simple string content
- Content blocks: Messages can have arrays of content blocks (text, images, thinking)
- Image conversion:
image_urlblocks with URL are supported - Tool calls: Converted from message assistant tool calls to Cohere format
- Tool messages: Tool call results are passed with
tool_call_id
Tool Conversion
Section titled “Tool Conversion”Tool definitions are adapted to Cohere format with the following mappings:
- Function
name→name(unchanged) - Function
parameters→parameters(flexible JSON format) - Strict mode (
strict: true) is silently dropped (not supported)
Tool choice mapping:
"none"→"NONE""auto"or"required"→"REQUIRED"or"AUTO"- Specific tool selection →
"REQUIRED"(Cohere uses function-level selection)
Response Format
Section titled “Response Format”Supported formats:
text- Plain text responsejson_object- Structured JSON responsejson_schema- JSON with schema validation (converted tojson_object)
Schema is passed through response_format.json_schema field.
Response Conversion
Section titled “Response Conversion”Field Mapping
Section titled “Field Mapping”finish_reason:COMPLETE/STOP_SEQUENCE→stop,MAX_TOKENS→length,TOOL_CALL→tool_callsinput_tokens→prompt_tokens|output_tokens→completion_tokenscached_tokens→prompt_tokens_details.cached_tokens(if present)- Tool call arguments converted from string → string (no conversion needed, Cohere uses string format)
Streaming
Section titled “Streaming”Event sequence: message-start → content-start → content-delta → content-end → message-end
Delta types:
content-deltawith text → message contentcontent-deltawith thinking → reasoning texttool-call-start/delta/end→ tool call eventstool-plan-delta→ tool planning output
Caveats
Section titled “Caveats”Minimum Thinking Budget
Severity: Low
Behavior: reasoning.max_tokens must be >= 1
Impact: Very low impact, conversion happens automatically
Code: chat.go:104-130
Top P Renamed
Severity: Low
Behavior: top_p parameter renamed to p
Impact: Parameter name changes internally
Code: chat.go:99
Strict Tool Mode Dropped
Severity: Low
Behavior: strict: true in tool definitions silently dropped
Impact: No schema validation enforcement
Code: chat.go:168-185
Tool Arguments Format
Severity: Low
Behavior: Tool arguments are already strings, no JSON serialization needed
Impact: Minimal - Cohere v2 API expects string format
Code: chat.go:70-78
2. Responses API
Section titled “2. Responses API”The Responses API uses the same underlying /v2/chat endpoint but converts between OpenAI’s Responses format and Cohere’s format.
Request Parameters
Section titled “Request Parameters”Parameter Mapping
Section titled “Parameter Mapping”| Parameter | Transformation |
|---|---|
max_output_tokens | Renamed to max_tokens |
temperature, top_p → p | Direct pass-through for temperature; top_p renamed to p |
instructions | Becomes system message |
text.format | Converted to response_format |
tools | Schema restructured (see Chat Completions) |
tool_choice | Type mapped (see Chat Completions) |
reasoning | Mapped to thinking (see Reasoning / Thinking) |
stop | Via extra_params, renamed to stop_sequences |
top_k | Via extra_params (Cohere-specific) |
frequency_penalty, presence_penalty | Via extra_params |
Extra Parameters
Section titled “Extra Parameters”Use extra_params (SDK) or pass directly in request body (Gateway):
curl -X POST http://localhost:8080/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "cohere/command-r-plus", "input": "Hello, how are you?", "top_k": 40, "stop": [".", "!"] }'resp, err := client.ResponsesRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldResponsesRequest{ Provider: schemas.Cohere, Model: "cohere/command-r-plus", Input: messages, Params: &schemas.ResponsesParameters{ ExtraParams: map[string]interface{}{ "top_k": 40, "stop": []string{".", "!"}, }, },})Input & Instructions
Section titled “Input & Instructions”- Input: String converted to user message or array converted to messages
- Instructions: Becomes system message (prepended to messages)
Tool Support
Section titled “Tool Support”Supported types: function
Tool conversions same as Chat Completions.
Response Conversion
Section titled “Response Conversion”text→message|tool_use→function_callinput_tokens/output_tokenspreserved- Token details with cached tokens support
Streaming
Section titled “Streaming”Event sequence: message-start → content-start → content-delta → content-end → message-end
Special handling:
- Tool call arguments accumulated across chunks
- Synthetic
output_item.addedevents emitted for text/reasoning - Stable item IDs generated as
msg_{messageID}_item_{outputIndex}
3. Embeddings
Section titled “3. Embeddings”Request Parameters
Section titled “Request Parameters”Parameter Mapping
Section titled “Parameter Mapping”| Parameter | Transformation |
|---|---|
input (text or array) | Converted to texts array |
dimensions | Renamed to output_dimension |
input_type | Via extra_params (required, defaults to "search_document") |
embedding_types | Via extra_params (array of embedding types) |
truncate | Via extra_params (how to handle long inputs) |
max_tokens | Via extra_params (max tokens to embed per input) |
Extra Parameters
Section titled “Extra Parameters”Use extra_params for Cohere-specific embedding options:
curl -X POST http://localhost:8080/v1/embeddings \ -H "Content-Type: application/json" \ -d '{ "model": "cohere/embed-english-v3.0", "input": ["text to embed"], "input_type": "search_query", "embedding_types": ["float"], "truncate": "START" }'resp, err := client.EmbeddingRequest(schemas.NewDeepIntShieldContext(ctx, schemas.NoDeadline), &schemas.DeepIntShieldEmbeddingRequest{ Provider: schemas.Cohere, Model: "cohere/embed-english-v3.0", Input: &schemas.EmbeddingInput{ Texts: []string{"text to embed"}, }, Params: &schemas.EmbeddingParameters{ Dimensions: schemas.Ptr(1024), ExtraParams: map[string]interface{}{ "input_type": "search_query", "embedding_types": []string{"float"}, "truncate": "START", }, },})Critical Notes
Section titled “Critical Notes”- Input Type Required: Cohere v3+ models require
input_typeparameter (defaults to"search_document") - Embedding Types: Specify which embedding types to return (e.g.,
"float","int8")
Response Conversion
Section titled “Response Conversion”embeddings.float→data[].embeddingmeta.tokens→ usage information- Multiple embedding types handled
4. List Models
Section titled “4. List Models”Request: GET /v1/models?page_size={defaultPageSize}
Field mapping: Model data converted to standard format
Pagination: Cursor-based with next_page_token
Note: endpoint and default_only filters available via extra_params