Skip to content

Reranking

Use reranking to sort documents by relevance for search, retrieval, and context selection.

  • Cohere: cohere/rerank-v3.5
  • vLLM: vllm/BAAI/bge-reranker-v2-m3
  • Bedrock: bedrock/<rerank-model-or-arn>
  • Vertex AI: vertex/<ranking-model>
Terminal window
curl --location 'http://localhost:8080/v1/rerank' \
--header 'Content-Type: application/json' \
--data '{
"model": "cohere/rerank-v3.5",
"query": "What is DeepIntShield?",
"documents": [
{"text": "DeepIntShield is an AI gateway that unifies many LLM providers."},
{"text": "Paris is the capital of France."},
{"text": "DeepIntShield exposes an OpenAI-compatible API."}
]
}'
  • model (required): model in provider/model format
  • query (required): query used for ranking
  • documents (required): array of documents with text (optional id, meta)
  • top_n (optional): maximum number of results
  • max_tokens_per_doc (optional): provider-dependent document token cap
  • priority (optional): provider-dependent priority hint
  • return_documents (optional): include matched document content in each result
  • fallbacks (optional): fallback models in provider/model format
Terminal window
curl --location 'http://localhost:8080/v1/rerank' \
--header 'Content-Type: application/json' \
--data '{
"model": "cohere/rerank-v3.5",
"query": "gateway observability",
"top_n": 2,
"return_documents": true,
"documents": [
{"id": "a", "text": "DeepIntShield supports observability plugins like OTEL and Maxim."},
{"id": "b", "text": "DeepIntShield can run in Kubernetes and ECS."},
{"id": "c", "text": "Token counting is available at /v1/responses/input_tokens."}
]
}'

When using a vllm/... model, DeepIntShield sends rerank requests to /v1/rerank first and automatically retries /rerank when the upstream endpoint responds with 404, 405, or 501.

{
"results": [
{
"index": 0,
"relevance_score": 0.98,
"document": {
"id": "a",
"text": "DeepIntShield supports observability plugins like OTEL and Maxim."
}
},
{
"index": 2,
"relevance_score": 0.63,
"document": {
"id": "c",
"text": "Token counting is available at /v1/responses/input_tokens."
}
}
],
"model": "rerank-v3.5",
"usage": {
"prompt_tokens": 52,
"completion_tokens": 0,
"total_tokens": 52
},
"extra_fields": {
"request_type": "rerank",
"provider": "cohere",
"latency": 245,
"chunk_index": 0
}
}
  • Missing query -> query is required for rerank
  • Empty documents -> documents are required for rerank
  • Blank document text -> document text is required for rerank at index N
  • top_n < 1 -> top_n must be at least 1

Now that you understand reranking, explore these related topics: