Embeddable Lemonade Guide
Embeddable Lemonade is a portable build of the lemond service that you can bundle into your app.
Contents:
Who is this for?
Use Embeddable Lemonade instead of a global Lemonade Service when you want a cohesive end-to-end experience for users of your app.
- Users only see your installer, icons, etc.
- Prevent users and other apps from directly interacting with lemond.
- Keep your models private from the rest of the system.
- Customize lemond to your exact specifications, including backend versions, available models, and much more.
What's in the release artifact?
Embeddable Lemonade is an zip/tarball artifact shipped in Lemonade releases.
- Windows:
lemonade-embeddable-10.1.0-windows-x64.zip - Ubuntu:
lemonade-embeddable-10.1.0-ubuntu-x64.tar.gz
Note: see the Building from Source for instructions for building your own embeddable Lemonade from source, including for other Linux distros.
Each archive has the following contents:
lemond.exe/lemondexecutable: your own private Lemonade instance.lemonade.exe/lemonadeCLI: useful for configuring and testinglemondbefore you ship. Feel free to exclude this from your shipped app.resources/server_models.json: customizable list of models thatlemondwill show on themodelsendpoint.backend_versions.json: customizable list that determines which versions of llama.cpp, FastFlowLM, etc. will be used as backends forlemond.defaults.json: default values forlemond'sconfig.jsonfile. Safe to delete afterconfig.jsonhas been initialized.
Customization Overview
While you can ship Embeddable Lemonade as-is, there many opportunities to customize it before packaging it into your app.
How it Works
Many of the customization options rely of lemond's config.json file, a persistent store of settings. Learn more about the individual settings in the configuration guide.
config.json is automatically generated based on the values in resources/defaults.json the first time lemond starts. The positional arg lemond DIR determines where config.json and other runtime files (e.g., backend binaries) will be located.
In the examples in this guide, we start lemond ./ to place these files in the same directory as lemond itself. Then:
- We use the
lemonadeCLI'sconfig setcommand to programmatically customize the contents ofconfig.json(you can also manually editconfig.jsonif you prefer). - Use
lemonade backends installto pre-download backends to be bundled in your app. - Edit
server_models.jsonandbackend_versions.jsonto fully customize the experience for your users. - You can delete the
lemonadeCLI anddefaults.jsonfiles to minimize the footprint of your app.
Finally, you can place the fully-configured Embeddable Lemonade folder into your app's installer.
Deployment-Ready Layout
Once you've finished customization, you'll have a portable Lemonade folder ready for deployment with a layout like this:
lemond.exe # App runs lemond as a subprocess
lemonade.exe # Optional: CLI management for lemond
LICENSE # Lemonade license file
config.json # Persistent customized settings for lemond
recipe_options.json # Per-model customization (e.g., llama args)
resources\
|- server_models.json # Customized lemond models list
|- backend_versions.json # Customized version numbers for llamacpp, etc.
bin\ # Pre-downloaded backends bundled into app
|- llamacpp\ # GPU LLMs, embedding, and reranking
|- rocm\
|- llama-server.exe
|- vulkan\
|- llama-server.exe
|- ryzenai-server\ # NPU LLMs
|- flm\ # NPU LLMs, embedding, and ASR
|- sdpp\ # GPU image generation
|- whispercpp\ # NPU and GPU ASR
models\ # Hugging Face standard layout for models
|- models--unsloth--Qwen3-0.6B-GGUF\
extra_models\ # Additional GGUF files
|- my_custom_model.gguf
lemond # App runs lemond as a subprocess
lemonade # Optional: CLI management for lemond
LICENSE # Lemonade license file
config.json # Persistent customized settings for lemond
recipe_options.json # Per-model customization (e.g., llama args)
resources/
|- server_models.json # Customized lemond models list
|- backend_versions.json # Customized version numbers for llamacpp, etc.
bin/ # Pre-downloaded backends bundled into app
|- llamacpp/ # GPU LLMs, embedding, and reranking
|- rocm/
|- llama-server
|- vulkan/
|- llama-server
|- ryzenai-server/ # NPU LLMs
|- flm/ # NPU LLMs, embedding, and ASR
|- sdpp/ # GPU image generation
|- whispercpp/ # NPU and GPU ASR
models/ # Hugging Face standard layout for models
|- models--unsloth--Qwen3-0.6B-GGUF/
extra_models/ # Additional GGUF files
|- my_custom_model.gguf
In-Depth Customization
Reference detailed guides for each of the following subjects:
- Runtime: Using
lemondas a subprocess runtime. - Backends: Deploy backends at packaging time, install time, or runtime.
- Models: Bundling, organization, sharing, per-model settings.
- Building from Source: Customize
lemondcompile-time features.