Embeddable Lemonade: Runtime
This guide will show you how to operate Embeddable Lemonade in your app at runtime.
Contents:
- Launching
- Authenticating Requests
- Runtime Model and Backend Management
- Runtime Settings Management
GET /internal/configPOST /internal/set
Launching
We recommend that your app launches lemond as a subprocess, using a command like this:
set LEMONADE_API_KEY=KEY && lemond.exe ./ --port PORT
LEMONADE_API_KEY=KEY lemond ./ --port PORT
Breaking this down:
- LEMONADE_API_KEY=KEY sets an API key for lemond known only to your app. This locks out other apps, as well as users, from interfacing directly with lemond's endpoints.
- The positional ./ is the working directory for lemond, where it will look for config.json, bin/, etc.
- --port PORT ensures that lemond launches on a specific port where your app will find it.
Authenticating Requests
If you launch lemond with LEMONADE_API_KEY set, your app must send that same key on every HTTP request to Lemonade endpoints. Do this by setting an Authorization header with a Bearer token:
Authorization: Bearer KEY
For example, with curl:
curl http://localhost:8000/v1/health ^
-H "Authorization: Bearer KEY"
curl http://localhost:8000/v1/health \
-H "Authorization: Bearer KEY"
For JSON POST requests:
curl -X POST http://localhost:8000/internal/set ^
-H "Authorization: Bearer KEY" ^
-H "Content-Type: application/json" ^
-d "{\"log_level\": \"debug\"}"
curl -X POST http://localhost:8000/internal/set \
-H "Authorization: Bearer KEY" \
-H "Content-Type: application/json" \
-d '{"log_level": "debug"}'
In JavaScript:
await fetch("http://localhost:8000/v1/models", {
headers: {
Authorization: `Bearer ${apiKey}`,
},
});
This matches the existing CLI, tray, app, and test implementations in this repo. If the header is missing or the key is wrong, lemond will reject the request with 401 Unauthorized.
Runtime Model and Backend Management
lemond provides a full set of endpoints for managing models and backends at runtime.
| Endpoint | Description |
|---|---|
POST /v1/pull |
Download a model |
POST /v1/delete |
Delete a downloaded model |
POST /v1/load |
Load a model into memory |
POST /v1/unload |
Unload a model from memory |
POST /v1/install |
Install or update a backend |
POST /v1/uninstall |
Remove a backend |
GET /v1/models |
List available models |
GET /v1/health |
Server status and loaded models |
See the Server Spec for full request/response details.
Runtime Settings Management
Your app can manage its lemond instance at runtime by using /internal endpoints.
| Method | Path | Description |
|---|---|---|
POST |
/internal/set |
Unified config setter (see below) |
GET |
/internal/config |
Returns the full runtime config snapshot |
The settings defined in config.json can all be changed at runtime without restarting lemond with the /internal/set endpoint. See the Configuration Guide for details on all settings.
Note: The
lemonadeCLI uses/internal/setand/internal/configinternally for thelemonade configcommands.
GET /internal/config
Returns the full runtime configuration as a flat JSON object containing all server-level and recipe option keys with their current values.
Example:
curl http://localhost:8000/internal/config
curl http://localhost:8000/internal/config
POST /internal/set
Accepts a JSON object with one or more keys to update atomically. Returns {"status":"success","updated":{...}} on success, or 400 with an error message on validation failure.
Server-level keys (trigger immediate side effects):
| Key | Type | Side Effect |
|---|---|---|
port |
int (1–65535) | HTTP rebind |
host |
string | HTTP rebind |
log_level |
string (trace, debug, info, warning, error, fatal, none) |
Reconfigures log filter |
global_timeout |
int (positive) | Updates default HTTP client timeout |
no_broadcast |
bool | Stops or starts UDP beacon |
extra_models_dir |
string | Updates model manager search path |
Deferred keys (affect the next model load or eviction decision, no immediate side effect):
| Key | Type |
|---|---|
max_loaded_models |
int (-1 or positive) |
ctx_size |
int (positive) |
llamacpp_backend |
string |
llamacpp_args |
string |
sdcpp_backend |
string |
whispercpp_backend |
string |
whispercpp_args |
string |
steps |
int (positive) |
cfg_scale |
number |
width |
int (positive) |
height |
int (positive) |
flm_args |
string |
Example:
curl -X POST http://localhost:8000/internal/set ^
-H "Content-Type: application/json" ^
-d "{\"ctx_size\": 8192, \"max_loaded_models\": 3, \"log_level\": \"debug\"}"
curl -X POST http://localhost:8000/internal/set \
-H "Content-Type: application/json" \
-d '{"ctx_size": 8192, "max_loaded_models": 3, "log_level": "debug"}'