Skip to main content
POST
/
v1
/
rerank
Rerank
curl --request POST \
  --url https://api.example.com/v1/rerank \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "query": "<string>",
  "documents": [
    "<string>"
  ],
  "top_n": 123,
  "return_documents": true
}
'

Overview

Reorder a set of documents by semantic relevance to a query. Commonly used as the second-stage reranking step in RAG (Retrieval-Augmented Generation) pipelines.
Implemented following the SiliconFlow Rerank API format.

Supported Models

ModelDescription
BAAI/bge-reranker-v2-m3Multilingual reranking model, recommended
BAAI/bge-reranker-largePrimarily English

Request Parameters

model
string
required
Reranking model name, e.g. BAAI/bge-reranker-v2-m3
query
string
required
Query text
documents
string[]
required
List of documents to rerank
top_n
integer
Return the top N results. Defaults to returning all
return_documents
boolean
default:"true"
Whether to include the original document text in the response

Response Format

{
  "model": "BAAI/bge-reranker-v2-m3",
  "results": [
    {
      "index": 2,
      "relevance_score": 0.9875,
      "document": { "text": "Most relevant document content" }
    },
    {
      "index": 0,
      "relevance_score": 0.7432,
      "document": { "text": "Second most relevant document content" }
    },
    {
      "index": 1,
      "relevance_score": 0.1205,
      "document": { "text": "Less relevant document content" }
    }
  ],
  "usage": {
    "total_tokens": 128
  }
}

Code Examples

import requests

response = requests.post(
    "https://crazyrouter.com/v1/rerank",
    headers={
        "Authorization": "Bearer sk-xxx",
        "Content-Type": "application/json"
    },
    json={
        "model": "BAAI/bge-reranker-v2-m3",
        "query": "What is a vector database",
        "documents": [
            "A vector database is a database system specialized for storing and retrieving high-dimensional vectors",
            "Relational databases use tables to store structured data",
            "Vector databases support approximate nearest neighbor search, suitable for semantic retrieval scenarios",
            "Redis is an in-memory key-value store"
        ],
        "top_n": 2,
        "return_documents": True
    }
)

data = response.json()
for result in data["results"]:
    print(f"[{result['relevance_score']:.4f}] {result['document']['text']}")

Typical RAG Pipeline

User Query → Embedding Retrieval Top-K → Rerank → LLM Generates Answer
The number of input documents for reranking should not exceed 100. Too many documents will increase latency and cost.