Embeddable Lemonade: Runtime

This guide will show you how to operate Embeddable Lemonade in your app at runtime.

Contents:

Launching
Authenticating Requests
Runtime Model and Backend Management
Runtime Settings Management
GET /internal/config
POST /internal/set

Launching

We recommend that your app launches lemond as a subprocess, using a command like this:

Windows (cmd.exe)Linux (bash)

set LEMONADE_API_KEY=KEY && lemond.exe ./ --port PORT

LEMONADE_API_KEY=KEY lemond ./ --port PORT

Breaking this down: - LEMONADE_API_KEY=KEY sets an API key for lemond known only to your app. This locks out other apps, as well as users, from interfacing directly with lemond's endpoints. - The positional ./ is the working directory for lemond, where it will look for config.json, bin/, etc. - --port PORT ensures that lemond launches on a specific port where your app will find it.

Authenticating Requests

If you launch lemond with LEMONADE_API_KEY set, your app must send that same key on every HTTP request to Lemonade endpoints. Do this by setting an Authorization header with a Bearer token:

Authorization: Bearer KEY

For example, with curl:

Windows (cmd.exe)Linux (bash)

curl http://localhost:8000/v1/health ^
  -H "Authorization: Bearer KEY"

curl http://localhost:8000/v1/health \
  -H "Authorization: Bearer KEY"

For JSON POST requests:

Windows (cmd.exe)Linux (bash)

curl -X POST http://localhost:8000/internal/set ^
  -H "Authorization: Bearer KEY" ^
  -H "Content-Type: application/json" ^
  -d "{\"log_level\": \"debug\"}"

curl -X POST http://localhost:8000/internal/set \
  -H "Authorization: Bearer KEY" \
  -H "Content-Type: application/json" \
  -d '{"log_level": "debug"}'

In JavaScript:

await fetch("http://localhost:8000/v1/models", {
  headers: {
    Authorization: `Bearer ${apiKey}`,
  },
});

This matches the existing CLI, tray, app, and test implementations in this repo. If the header is missing or the key is wrong, lemond will reject the request with 401 Unauthorized.

Runtime Model and Backend Management

lemond provides a full set of endpoints for managing models and backends at runtime.

Endpoint	Description
`POST /v1/pull`	Download a model
`POST /v1/delete`	Delete a downloaded model
`POST /v1/load`	Load a model into memory
`POST /v1/unload`	Unload a model from memory
`POST /v1/install`	Install or update a backend
`POST /v1/uninstall`	Remove a backend
`GET /v1/models`	List available models
`GET /v1/health`	Server status and loaded models

See the Server Spec for full request/response details.

Runtime Settings Management

Your app can manage its lemond instance at runtime by using /internal endpoints.

Method	Path	Description
`POST`	`/internal/set`	Unified config setter (see below)
`GET`	`/internal/config`	Returns the full runtime config snapshot

The settings defined in config.json can all be changed at runtime without restarting lemond with the /internal/set endpoint. See the Configuration Guide for details on all settings.

Note: The lemonade CLI uses /internal/set and /internal/config internally for the lemonade config commands.

`GET /internal/config`

Returns the full runtime configuration as a flat JSON object containing all server-level and recipe option keys with their current values.

Example:

Windows (cmd.exe)Linux (bash)

curl http://localhost:8000/internal/config

curl http://localhost:8000/internal/config

`POST /internal/set`

Accepts a JSON object with one or more keys to update atomically. Returns {"status":"success","updated":{...}} on success, or 400 with an error message on validation failure.

Server-level keys (trigger immediate side effects):

Key	Type	Side Effect
`port`	int (1–65535)	HTTP rebind
`host`	string	HTTP rebind
`log_level`	string (`trace`, `debug`, `info`, `warning`, `error`, `fatal`, `none`)	Reconfigures log filter
`global_timeout`	int (positive)	Updates default HTTP client timeout
`no_broadcast`	bool	Stops or starts UDP beacon
`extra_models_dir`	string	Updates model manager search path

Deferred keys (affect the next model load or eviction decision, no immediate side effect):

Key	Type
`max_loaded_models`	int (-1 or positive)
`ctx_size`	int (positive)
`llamacpp_backend`	string
`llamacpp_args`	string
`sdcpp_backend`	string
`whispercpp_backend`	string
`whispercpp_args`	string
`steps`	int (positive)
`cfg_scale`	number
`width`	int (positive)
`height`	int (positive)
`flm_args`	string

Example:

Windows (cmd.exe)Linux (bash)

curl -X POST http://localhost:8000/internal/set ^
  -H "Content-Type: application/json" ^
  -d "{\"ctx_size\": 8192, \"max_loaded_models\": 3, \"log_level\": \"debug\"}"

curl -X POST http://localhost:8000/internal/set \
  -H "Content-Type: application/json" \
  -d '{"ctx_size": 8192, "max_loaded_models": 3, "log_level": "debug"}'