Cloud Offload
Lemonade can route inference to any OpenAI-compatible cloud provider (Fireworks, OpenAI, OpenRouter, Together, etc.) alongside locally-loaded models. Cloud-routed models show up in /v1/models like any other recipe, so every client connecting to your lemond — the desktop app, the CLI, third-party SDKs, and agents launched via lemonade launch — sees the same catalog without per-client configuration.
Status: experimental. Cloud routing has been validated with Fireworks, OpenAI, OpenRouter, and Together. Other OpenAI-compatible providers should work; report problems with
lemondlogs and the provider's/v1/modelsresponse.
Quickstart
There are two ways to authenticate. Env vars are preferred — they're persistent, never written to disk by lemond, and visible to every connecting client.
Option A: Environment variable (recommended)
Set the provider's API key in lemond's environment before starting the server:
export LEMONADE_FIREWORKS_API_KEY=fw-XXXXX
Then install the provider once:
lemonade cloud install fireworks --base-url https://api.fireworks.ai/inference/v1
That's it. lemond discovers Fireworks's chat-capable models, registers them under the fireworks. namespace, and surfaces them in /v1/models:
lemonade list | grep fireworks
Option B: Runtime API key
If you don't want to set an env var (e.g., on a dev box where the key shouldn't persist), register the provider and supply the key at runtime:
lemonade cloud install fireworks --base-url https://api.fireworks.ai/inference/v1
lemonade cloud auth fireworks
# Prompts: API key for fireworks:
Or in one step:
lemonade cloud install fireworks \
--base-url https://api.fireworks.ai/inference/v1 \
--api-key fw-XXXXX
Runtime keys live in lemond's process memory only — they're never written to disk and they vanish on restart. To make them survive a restart, switch to Option A.
Using cloud models
Cloud-discovered models use a dot-namespaced name: <provider>.<upstream-id>. For example, after installing Fireworks you'll see entries like:
fireworks.accounts/fireworks/models/kimi-k2p5
fireworks.qwen3-235b-a22b
They work everywhere a local model name works:
lemonade load fireworks.kimi-k2p5
lemonade run fireworks.kimi-k2p5
Standard OpenAI chat completions:
curl -X POST http://localhost:13305/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fireworks.kimi-k2p5",
"messages": [{"role": "user", "content": "hi"}]
}'
No special headers, no per-request credentials — lemond resolves the key from its registry and forwards the request transparently.
Authentication precedence
When lemond needs an API key for a provider, it resolves it in this order:
LEMONADE_<PROVIDER>_API_KEYenv var, if set inlemond's environment.- Runtime key from
POST /v1/cloud/auth, if previously supplied this session. - None —
lemondrefuses to call the provider and returns a structured error.
Env vars always win. If you POST /v1/cloud/auth while the env var is set, the server returns 409 Conflict with {"error":{"type":"auth_conflict","env_var":"LEMONADE_<PROVIDER>_API_KEY"}} and does not store the supplied key. This means an operator who provisions a "house" key via env can trust that a client can't silently override it.
How discovery works
lemond calls GET <base_url>/models for each installed provider with a resolvable key, then filters the results to chat-capable models (using supports_chat, capabilities, architecture.modality, type, or id-pattern fallback depending on the provider). For each model it captures:
- Public name —
<provider>.<cleaned_upstream_id>after strippingaccounts/<x>/models/wrappers and deduplicating leading provider segments. - Capability labels —
vision,tool-calling,reasoning, normalized from each provider's divergent metadata into Lemonade's shared vocabulary. - Context window — from
context_length, when reported. - Per-million-token cost — USD per 1M input/output tokens, from OpenRouter (per-token × 1e6) or Together (per-1M), when reported. Used for display only — never affects routing.
Discovery runs at every cache build (server startup, install, auth) and is best-effort: an unreachable provider logs a warning and is skipped without blocking the rest of the catalog.
Admin / multi-client deployments
A single lemond can serve multiple connecting clients (GUI, CLI, SDKs, coding agents on the same or different machines). Cloud config is shared infrastructure config, not per-client state:
- Provider URLs persist in
lemond'sconfig.jsonundercloud_providers. Every connecting client sees the same list. - API keys live in env vars (persistent, shared) or
lemondprocess memory (ephemeral). They are never written to disk bylemond, andGET /v1/system-inforeports auth status but never the key value.
A common admin pattern: set LEMONADE_FIREWORKS_API_KEY in the systemd / Docker / service environment, install the provider once, and every user pointing their client at the server gets cloud access without seeing the key.
Troubleshooting
| Symptom | Check |
|---|---|
Provider installed but models_discovered: 0 in system-info |
No resolvable key — env var missing or runtime key not POSTed. |
POST /v1/cloud/auth returns 409 |
Env var is set for that provider. Unset it or use the env-var value going forward. |
| Chat returns "No API key for cloud provider X" | Same as above — check LEMONADE_<PROVIDER>_API_KEY is exported in lemond's environment, not your shell. |
Cloud model missing from /v1/models |
Provider doesn't expose it as chat-capable, or discovery failed. Check lemond logs for warnings from the Cloud component. |
For a structured view of every installed provider's auth state and discovered model count, hit GET /v1/system-info — the cloud.providers[] block reports env_var_set, runtime_key_set, and models_discovered per provider.
See also: