This page documents Lemonade’s llama.cpp-specific compatibility surface.
| Method | Endpoint | Description | Modality |
|---|---|---|---|
POST |
/v1/reranking |
Reranking | query + documents -> relevance-scored documents |
POST /v1/rerankingReranking API for llama.cpp-compatible reranker models. You provide a query and a list of documents, and receive relevance scores for each document. Lemonade will load the requested model automatically if it is not already loaded.
Note: This endpoint is part of Lemonade’s llama.cpp compatibility layer. Internally, Lemonade forwards the request to llama.cpp’s
/v1/rerankendpoint.
Note: This endpoint is only available for reranker-specific models using the
llamacpprecipe, such asbge-reranker-v2-m3-GGUF.
| Parameter | Required | Description | Status |
|---|---|---|---|
query |
Yes | The search query text. | |
documents |
Yes | Array of document strings to score against the query. | |
model |
Yes | The reranking model to use. If not already loaded, Lemonade loads it before forwarding the request. |
=== “PowerShell”
```powershell
Invoke-WebRequest `
-Uri "http://localhost:13305/v1/reranking" `
-Method POST `
-Headers @{ "Content-Type" = "application/json" } `
-Body '{
"model": "bge-reranker-v2-m3-GGUF",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain."
]
}'
```
=== “Bash”
```bash
curl -X POST http://localhost:13305/v1/reranking \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3-GGUF",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain."
]
}'
```
{
"model": "bge-reranker-v2-m3-GGUF",
"object": "list",
"results": [
{
"index": 0,
"relevance_score": 8.60673713684082
},
{
"index": 1,
"relevance_score": -5.3886260986328125
},
{
"index": 2,
"relevance_score": -3.555561065673828
}
],
"usage": {
"prompt_tokens": 51,
"total_tokens": 51
}
}
Field Descriptions:
model - Model identifier used for rerankingobject - Type of response object, always "list"results - Array of all input documents with relevance scores
index - Original index of the document in the input arrayrelevance_score - Relevance score assigned by the model; higher means more relevantusage - Token usage statistics
prompt_tokens - Number of tokens in the inputtotal_tokens - Total tokens processedNote: Results are returned in input order. To rank documents by relevance, sort
resultsbyrelevance_scorein descending order on the client side.