Reranking

Use reranking to sort documents by relevance for search, retrieval, and context selection.

Provider Model Examples

Cohere: cohere/rerank-v3.5
vLLM: vllm/BAAI/bge-reranker-v2-m3
Bedrock: bedrock/<rerank-model-or-arn>
Vertex AI: vertex/<ranking-model>

Basic Request

curl --location 'http://localhost:8080/v1/rerank' \
--header 'Content-Type: application/json' \
--data '{
  "model": "cohere/rerank-v3.5",
  "query": "What is DeepIntShield?",
  "documents": [
    {"text": "DeepIntShield is an AI gateway that unifies many LLM providers."},
    {"text": "Paris is the capital of France."},
    {"text": "DeepIntShield exposes an OpenAI-compatible API."}
  ]
}'

Request Parameters

model (required): model in provider/model format
query (required): query used for ranking
documents (required): array of documents with text (optional id, meta)
top_n (optional): maximum number of results
max_tokens_per_doc (optional): provider-dependent document token cap
priority (optional): provider-dependent priority hint
return_documents (optional): include matched document content in each result
fallbacks (optional): fallback models in provider/model format

Example with Options

curl --location 'http://localhost:8080/v1/rerank' \
--header 'Content-Type: application/json' \
--data '{
  "model": "cohere/rerank-v3.5",
  "query": "gateway observability",
  "top_n": 2,
  "return_documents": true,
  "documents": [
    {"id": "a", "text": "DeepIntShield supports observability plugins like OTEL and Maxim."},
    {"id": "b", "text": "DeepIntShield can run in Kubernetes and ECS."},
    {"id": "c", "text": "Token counting is available at /v1/responses/input_tokens."}
  ]
}'

vLLM Endpoint Compatibility

When using a vllm/... model, DeepIntShield sends rerank requests to /v1/rerank first and automatically retries /rerank when the upstream endpoint responds with 404, 405, or 501.

Response Shape

{
  "results": [
    {
      "index": 0,
      "relevance_score": 0.98,
      "document": {
        "id": "a",
        "text": "DeepIntShield supports observability plugins like OTEL and Maxim."
      }
    },
    {
      "index": 2,
      "relevance_score": 0.63,
      "document": {
        "id": "c",
        "text": "Token counting is available at /v1/responses/input_tokens."
      }
    }
  ],
  "model": "rerank-v3.5",
  "usage": {
    "prompt_tokens": 52,
    "completion_tokens": 0,
    "total_tokens": 52
  },
  "extra_fields": {
    "request_type": "rerank",
    "provider": "cohere",
    "latency": 245,
    "chunk_index": 0
  }
}

Common Validation Errors

Missing query -> query is required for rerank
Empty documents -> documents are required for rerank
Blank document text -> document text is required for rerank at index N
top_n < 1 -> top_n must be at least 1

Next Steps

Now that you understand reranking, explore these related topics:

Essential Topics

Multimodal AI - Process images, audio, and multimedia content
Tool Calling - Enable AI models to use external tools and functions
Provider Configuration - Multiple providers for redundancy
Integrations - Drop-in compatibility with existing SDKs

Advanced Topics

Core Features - Advanced DeepIntShield capabilities
Architecture - How DeepIntShield works internally
Deployment - Production setup and scaling