Add a Custom Model
This guide explains every supported way to add a custom model to Lemonade Server. Start with the CLI workflows below unless you specifically need to hand-edit user_models.json or recipe_options.json.
Choose a Workflow
Pull a Hugging Face model
For most Hugging Face GGUFs, use the repo id directly:
lemonade pull org/repo
Lemonade fetches the repo, lists the available quantizations and sharded folder variants, auto-detects mmproj-*.gguf files for vision models, infers labels (vision/embeddings/reranking) from the repo id, and presents an interactive variant menu.
To skip the menu, append a variant:
lemonade pull org/repo:Q4_K_M
Examples:
# Interactive GGUF variant menu
lemonade pull unsloth/Qwen3-8B-GGUF
# Specific GGUF variant
lemonade pull unsloth/Qwen3-8B-GGUF:Q4_K_M
# Vision model with mmproj auto-detection
lemonade pull ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
# Sharded variant
lemonade pull unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M
Register with explicit CLI flags
Use a user.* name plus --checkpoint and --recipe when you need full control: multiple checkpoints, a non-default recipe, or custom labels.
lemonade pull user.NAME --checkpoint TYPE CHECKPOINT --recipe RECIPE [--label LABEL ...]
Examples:
# Register and pull a custom GGUF model with a main checkpoint
lemonade pull user.Phi-4-Mini-GGUF \
--checkpoint main unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M \
--recipe llamacpp
# Register and pull a vision model with main + mmproj
lemonade pull user.Gemma-3-4b \
--checkpoint main ggml-org/gemma-3-4b-it-GGUF:Q4_K_M \
--checkpoint mmproj ggml-org/gemma-3-4b-it-GGUF:mmproj-model-f16.gguf \
--recipe llamacpp
# Register a model with multiple labels
lemonade pull user.MyCodingModel \
--checkpoint main org/model:Q4_0 \
--recipe llamacpp \
--label coding \
--label tool-calling
Supported registration flags:
| Flag | Description |
|---|---|
--checkpoint TYPE CHECKPOINT |
Add a checkpoint entry. Repeat for multi-file models such as main + mmproj or main + vae. |
--recipe RECIPE |
Recipe to associate with the new user.* model. Common values: llamacpp, flm, ryzenai-llm, vllm, whispercpp, moonshine, sd-cpp, kokoro, collection.omni. |
--label LABEL |
Add a label to the new model. Repeatable. Valid labels include coding, embeddings, hot, mtp, reasoning, reranking, tool-calling, vision. |
--components MODEL [MODEL ...] |
Components for an omni collection (see below). Use with --recipe collection.omni. |
Register an omni collection
A collection is a meta-model made up of components. An omni collection is the recipe type behind Lemonade Omni Models — registered with recipe: "collection.omni".
Components must already be registered as built-in models or previously pulled user.* models. Components do not need to be downloaded already; missing component files are pulled by the same command.
lemonade pull user.MyKit \
--recipe collection.omni \
--components Qwen3-0.6B-GGUF Whisper-Tiny SD-Turbo
lemonade load user.MyKit loads every component. lemonade delete user.MyKit removes only the collection entry; component files stay on disk.
Register a custom Omni Model from the desktop app
The desktop app offers a UI-driven path to register the same recipe: "collection.omni" entry — useful when you want to swap in a different planner LLM or a different image/ASR/TTS backbone without waiting for a new built-in Lemonade Omni Model to ship.
- Register or download the concrete models you want to use in Model Manager.
- In the desktop app menu bar, open File > New Omni Model > Manually (or From JSON to import an exported one).
- Pick one planner LLM and any optional models for image generation, image editing, vision analysis, speech-to-text, and text-to-speech.
- Save the Omni Model.
- Select the new
user.<name>entry in the chat model picker — it appears alongside the built-in omni models under the Lemonade category.
Custom Omni Models are registered through the same POST /v1/pull path with recipe: "collection.omni" that the built-ins and the CLI flow above use. They live under the server's user.* namespace, so a custom Omni Model named MyKit is addressable as user.MyKit. They behave like built-in omni models for routing purposes: the selected planner LLM remains the loop driver that decides when to call tools, and optional role models are only loaded/used when their corresponding tool is called.
The Omni Model editor only offers already-registered compatible models for each role:
| Omni Model role | Tool unlocked | Required model capability |
|---|---|---|
| LLM | Chat loop and tool calls | Concrete chat model, preferably tool-calling capable |
| Vision / image analysis | analyze_image |
vision label |
| Image generation | generate_image |
image label |
| Image editing | edit_image |
edit label |
| Speech-to-text | transcribe_audio |
audio or transcription label |
| Text-to-speech | text_to_speech |
tts or speech label |
If a component model is deleted later, the Omni Model entry remains registered but is hidden from the chat picker until every referenced component is available again.
Share a collection: export, import, and Hugging Face
lemonade export <collection> (and the desktop app's Export button) writes a collection file: the
collection's /v1/models/{model_id} object normalized into
an import-ready /v1/pull body. The file carries model_name,
recipe, components, and a models array embedding each component's definition, so it is
self-contained — the importing machine does not need any of the components registered beforehand.
Exported files never contain the user-specific runtime fields suggested, created, or downloaded —
the server regenerates those on import (suggested is set to true for registered models;
downloaded is computed from local files).
The same file works, verbatim, in three places:
lemonade import <CollectionName>.jsonon the CLI (or File > New Omni Model > From JSON in the desktop app).POST /v1/pullwith the file contents as the request body.- Uploaded to a Hugging Face model repo named after the collection, so that the repo contains
<RepoName>.json.lemonade pull <org>/<repo>looks for the manifest named after the repo, then registers and downloads everything in it. The built-inLMX-Omni-*collections are distributed this way.
On import, component names that are already registered keep their local definition (differences from
the embedded definition are logged as warnings); unknown components are registered as user.* models
from the embedded definitions.
Example collection file:
{
"model_name": "user.MyKit",
"recipe": "collection.omni",
"checkpoints": { "main": "" },
"components": ["Qwen3-0.6B-GGUF", "Whisper-Tiny"],
"models": [
{
"model_name": "Qwen3-0.6B-GGUF",
"recipe": "llamacpp",
"checkpoints": { "main": "unsloth/Qwen3-0.6B-GGUF:Q4_0" },
"labels": ["reasoning"],
"recipe_options": {},
"size": 0.38
},
{
"model_name": "Whisper-Tiny",
"recipe": "whispercpp",
"checkpoints": {
"main": "ggerganov/whisper.cpp:ggml-tiny.bin",
"npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
},
"labels": ["transcription", "realtime-transcription"],
"recipe_options": {},
"size": 0.075
}
],
"labels": [],
"recipe_options": {}
}
Register via API
The /v1/pull endpoint accepts the same model registration fields as the CLI. Use this when integrating Lemonade into another app or script:
curl -X POST http://localhost:13305/v1/pull \
-H "Content-Type: application/json" \
-d '{
"model_name": "user.MyModel",
"recipe": "llamacpp",
"checkpoint": "org/repo:Q4_0"
}'
For multi-file models, send checkpoints:
curl -X POST http://localhost:13305/v1/pull \
-H "Content-Type: application/json" \
-d '{
"model_name": "user.Gemma-3-4b",
"recipe": "llamacpp",
"checkpoints": {
"main": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
"mmproj": "ggml-org/gemma-3-4b-it-GGUF:mmproj-model-f16.gguf"
},
"labels": ["vision"]
}'
For an omni collection, send components:
curl -X POST http://localhost:13305/v1/pull \
-H "Content-Type: application/json" \
-d '{
"model_name": "user.MyKit",
"recipe": "collection.omni",
"components": ["Qwen3-0.6B-GGUF", "Whisper-Tiny", "SD-Turbo"]
}'
Edit JSON files directly
Advanced users can edit user_models.json and recipe_options.json directly. The rest of this guide documents those files and gives complete examples.
Overview
Custom model configuration involves two files, both located in the Lemonade cache directory:
| File | Purpose |
|---|---|
user_models.json |
Model registry — defines what models are available (checkpoint, recipe, etc.) |
recipe_options.json |
Per-model settings — configures how models run (context size, backend, etc.) |
If you used an installer from a Lemonade release, the cache directory is typically:
| OS | Cache directory |
|---|---|
| Linux systemd install | /var/lib/lemonade/.cache/lemonade |
| Windows | %USERPROFILE%\.cache\lemonade |
| macOS system install | /Library/Application Support/lemonade/.cache |
For a standalone lemond executable, the default is ~/.cache/lemonade unless you pass an explicit cache_dir argument or set LEMONADE_CACHE_DIR.
Model naming spec
Lemonade tracks three sources of models. Every model has a canonical ID of the form <source>.<bare-name>:
| Canonical ID | Source |
|---|---|
user.NAME |
Model registered via lemonade pull (entry in user_models.json) |
extra.NAME |
Model imported by dropping a GGUF in --extra-models-dir |
builtin.NAME |
Model compiled into Lemonade's built-in catalog (server_models.json) |
The bare name NAME is an alias that always resolves to whichever source wins precedence for that name. Precedence is registered > imported > built-in.
What the API emits
/v1/models, /v1/models/{id}, lemonade list, and the Ollama /api/tags endpoint emit each model with an id set to either:
- the bare name if the model is the precedence-winner for its bare name, or
- the canonical-prefixed ID if another source outranks it on the same bare name.
For each bare name with collisions, the response contains one bare row plus one canonical-prefixed row per shadowed source.
What input forms are accepted
Anywhere a model name is accepted (request bodies, CLI args, URL path parameters), all four forms work:
- the bare name
NAME— resolves to the winner user.NAME— always the registered model (404 if none)extra.NAME— always the imported model (404 if none)builtin.NAME— always the built-in model (404 if none)
lemonade pull rejects model names starting with extra. or builtin. since those prefixes are reserved.
CLI vs. GUI display
The CLI (lemonade list) prints the API id verbatim. That means the Name column is always copy-paste-safe — every cell is a valid input to lemonade load, lemonade delete, lemonade run, etc.
The Tauri desktop app and the web app apply a display transformation on top of the API id: bare ids render as NAME, and canonical-prefixed ids render as NAME (registered) / NAME (imported) / NAME (builtin). The suffix appears only for shadowed sources.
Five reference cases
| Sources | /v1/models ids |
Resolution |
|---|---|---|
built-in Qwen2.5-Coder only |
Qwen2.5-Coder |
Qwen2.5-Coder, builtin.Qwen2.5-Coder → built-in |
built-in Foo + registered Foo |
Foo, builtin.Foo |
Foo/user.Foo → user; builtin.Foo → built-in |
built-in Bar + registered Bar + extra Bar |
Bar, extra.Bar, builtin.Bar |
Bar/user.Bar → user; extra.Bar → extra; builtin.Bar → built-in |
built-in Baz + extra Baz |
Baz, builtin.Baz |
Baz/extra.Baz → extra; builtin.Baz → built-in |
registered MyModel only |
MyModel |
MyModel/user.MyModel → user; builtin.MyModel → 404 |
user_models.json Reference
This file contains a JSON object where each key is a model name and each value defines the model's properties. Create this file in your cache directory if it doesn't exist.
Template
{
"MyCustomModel": {
"checkpoint": "org/repo-name:filename.gguf",
"recipe": "llamacpp",
"size": 3.5
}
}
Fields
| Field | Required | Type | Description |
|---|---|---|---|
checkpoint |
Yes* | String | HuggingFace checkpoint in org/repo or org/repo:variant format. Use org/repo:filename.gguf for GGUF models. |
checkpoints |
Yes* | Object | Alternative to checkpoint for models with multiple files. See Multi-file models. |
recipe |
Yes | String | Backend engine to use. One of: llamacpp, whispercpp, moonshine, sd-cpp, kokoro, ryzenai-llm, flm, collection.omni. |
components |
Yes** | Array | Components for a collection. Required when recipe: "collection.omni". See Collections. |
size |
No | Number | Model size in GB. Informational only — displayed in the UI and used for RAM filtering. |
mmproj |
No | String | Filename of the multimodal projector file for llamacpp vision models (must be in the same HuggingFace repo as the checkpoint). This is a top-level field, not inside checkpoints. |
image_defaults |
No | Object | Default image generation parameters for sd-cpp models. See Image defaults. |
* Either checkpoint or checkpoints is required, but not both.
** Required only when recipe: "collection.omni". Collections do not use checkpoint/checkpoints.
Checkpoint format
The checkpoint field uses the format org/repo:variant:
- GGUF models (exact filename):
org/repo:filename.gguf— e.g.,Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf - GGUF models (quantization shorthand):
org/repo:QUANT— e.g.,unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M. The server will search the repo for a matching.gguffile. - ONNX models:
org/repo— e.g.,amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx - Safetensor models:
org/repo:filename.safetensors— e.g.,stabilityai/sd-turbo:sd_turbo.safetensors
Multi-file models
For models that require multiple files (e.g., Whisper models with NPU cache, or Flux image models with separate VAE/text encoder), use checkpoints instead of checkpoint:
{
"My-Whisper-Model": {
"checkpoints": {
"main": "ggerganov/whisper.cpp:ggml-tiny.bin",
"npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
},
"recipe": "whispercpp",
"size": 0.075
}
}
Supported checkpoint keys:
| Key | Used by | Description |
|---|---|---|
main |
All | Primary model file |
npu_cache |
whispercpp | NPU-accelerated encoder cache |
text_encoder |
sd-cpp | Text encoder for image generation models |
vae |
sd-cpp | VAE for image generation models |
Collections
A collection bundles several already-registered models so they can be loaded, pulled, or deleted as a single entry. Collections do not have their own checkpoint — they reference other models by name. An omni collection is a collection type registered with recipe: "collection.omni" — this is the recipe behind Lemonade Omni Models.
{
"MyKit": {
"recipe": "collection.omni",
"components": ["Qwen3-0.6B-GGUF", "Whisper-Tiny", "SD-Turbo"]
}
}
Components must already be registered (built-in models, or other user.* entries earlier in this file). Loading the collection (lemonade load user.MyKit) loads each component; deleting the collection removes only the collection entry, leaving components on disk.
The equivalent CLI registration is shown in Register an omni collection.
Image defaults
For sd-cpp recipe models, you can specify default image generation parameters:
{
"My-SD-Model": {
"checkpoint": "org/repo:model.safetensors",
"recipe": "sd-cpp",
"size": 5.2,
"image_defaults": {
"steps": 20,
"cfg_scale": 7.0,
"width": 512,
"height": 512
}
}
}
Model naming
- In
user_models.json, store model names without theuser.prefix (e.g.,MyCustomModel). - When referencing the model in API calls, CLI commands, or
recipe_options.json, use the full prefixed name (e.g.,user.MyCustomModel). - Labels like
customare added automatically. Additional labels (reasoning,vision,embeddings,reranking) can be set via thepullCLI/API flags, or by including alabelsarray in the JSON entry.
recipe_options.json Reference
This file configures per-model runtime settings. Each key is a canonical model ID — one of user.NAME, extra.NAME, or builtin.NAME (see the Model naming spec above). Each value contains the settings for that model.
Template
{
"user.MyCustomModel": {
"ctx_size": 4096,
"llamacpp_backend": "vulkan",
"llamacpp_args": ""
},
"builtin.Qwen2.5-Coder-1.5B-Instruct": {
"ctx_size": 16384
}
}
Migration: Older Lemonade versions stored built-in entries under their bare name (e.g.
"Qwen2.5-Coder-1.5B-Instruct"with no prefix). On first load with the current version, any bare key matching a known built-in is rewritten tobuiltin.<name>in place. An INFO log line reports the number of migrated keys. Bare keys that don't match a built-in are preserved unchanged.Note: Per-model options can also be configured through the Lemonade desktop app's model settings, or via the
save_optionsparameter in the/api/v1/loadendpoint.
Complete Examples
Example 1: Adding a GGUF LLM with large context
user_models.json:
{
"Qwen2.5-Coder-1.5B-Instruct": {
"checkpoint": "Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf",
"recipe": "llamacpp",
"size": 1.0
}
}
recipe_options.json:
{
"user.Qwen2.5-Coder-1.5B-Instruct": {
"ctx_size": 16384,
"llamacpp_backend": "vulkan"
}
}
(Use builtin.NAME here if you're overriding a built-in model's defaults, or extra.NAME for an --extra-models-dir GGUF.)
Then load the model:
lemonade run user.Qwen2.5-Coder-1.5B-Instruct
Example 2: Adding a vision model with mmproj
user_models.json:
{
"My-Vision-Model": {
"checkpoint": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
"mmproj": "mmproj-model-f16.gguf",
"recipe": "llamacpp",
"size": 3.61
}
}
Example 3: Adding an embedding model
user_models.json:
{
"My-Embedding-Model": {
"checkpoint": "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S",
"recipe": "llamacpp",
"size": 0.08
}
}
The model will automatically be available as user.My-Embedding-Model. To mark it as an embedding model, use the manual registration flags on pull:
lemonade pull user.My-Embedding-Model \
--checkpoint main "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S" \
--recipe llamacpp \
--label embeddings
lemonade pull nomic-ai/nomic-embed-text-v1-GGUF — the embeddings label is auto-applied because the repo id contains embed.
Settings Priority
When loading a model, settings are resolved in this order (highest to lowest priority):
- Values explicitly passed in the
/api/v1/loadrequest - Per-model values from
recipe_options.json - Global configuration values, see Server Configuration
*_args merge behavior: For options ending in _args (e.g., llamacpp_args, whispercpp_args, sdcpp_args, vllm_args), the CLI/API arguments are merged rather than replaced. The merge works at the flag level with higher priority settings taking priority.
For full details, see the load endpoint documentation.
See Also
- CLI pull command — register and download models from the command line
/api/v1/pullendpoint — register and download models via API