llama.cpp-Specific API
This page documents Lemonade's llama.cpp-specific compatibility surface.
Summary
| Method | Endpoint | Description | Modality |
|---|---|---|---|
POST |
/v1/reranking |
Reranking | query + documents -> relevance-scored documents |
POST /v1/reranking
Reranking API for llama.cpp-compatible reranker models. You provide a query and a list of documents, and receive relevance scores for each document. Lemonade will load the requested model automatically if it is not already loaded.
Note: This endpoint is part of Lemonade's llama.cpp compatibility layer. Internally, Lemonade forwards the request to llama.cpp's
/v1/rerankendpoint.Note: This endpoint is only available for reranker-specific models using the
llamacpprecipe, such asbge-reranker-v2-m3-GGUF.
Parameters
| Parameter | Required | Description | Status |
|---|---|---|---|
query |
Yes | The search query text. | |
documents |
Yes | Array of document strings to score against the query. | |
model |
Yes | The reranking model to use. If not already loaded, Lemonade loads it before forwarding the request. |
Example request
Invoke-WebRequest `
-Uri "http://localhost:13305/v1/reranking" `
-Method POST `
-Headers @{ "Content-Type" = "application/json" } `
-Body '{
"model": "bge-reranker-v2-m3-GGUF",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain."
]
}'
curl -X POST http://localhost:13305/v1/reranking \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3-GGUF",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain."
]
}'
Response format
{
"model": "bge-reranker-v2-m3-GGUF",
"object": "list",
"results": [
{
"index": 0,
"relevance_score": 8.60673713684082
},
{
"index": 1,
"relevance_score": -5.3886260986328125
},
{
"index": 2,
"relevance_score": -3.555561065673828
}
],
"usage": {
"prompt_tokens": 51,
"total_tokens": 51
}
}
Field Descriptions:
model- Model identifier used for rerankingobject- Type of response object, always"list"results- Array of all input documents with relevance scoresindex- Original index of the document in the input arrayrelevance_score- Relevance score assigned by the model; higher means more relevantusage- Token usage statisticsprompt_tokens- Number of tokens in the inputtotal_tokens- Total tokens processed
Note: Results are returned in input order. To rank documents by relevance, sort
resultsbyrelevance_scorein descending order on the client side.