This guide explains how to manually register custom models in Lemonade Server using the JSON configuration files. This is useful for adding any HuggingFace model that isn’t in the built-in model list.
Tip: For most Hugging Face GGUFs, the easiest way to add a custom model is just:
lemonade pull org/repoLemonade fetches the repo, lists the available quantizations (and any sharded folder variants), auto-detects
mmproj-*.gguffiles for vision models, infers labels (vision/embeddings/reranking) from the repo id, and presents an interactive variant menu. To skip the menu, append:VARIANT:lemonade pull org/repo:Q4_K_MThe desktop app’s “Search Hugging Face” panel calls the same
/api/v1/pull/variantsendpoint under the hood.If you need full control — multiple checkpoints (
main+mmproj+vae+ …), a non-llamacpp recipe, or custom labels — use the advanced flags onlemonade pull:lemonade pull user.MyModel --checkpoint main "org/repo:file.gguf" --recipe llamacppThis guide covers the underlying JSON files for users who need manual control beyond what the CLI exposes.
Custom model configuration involves two files, both located in the Lemonade cache directory:
| File | Purpose |
|---|---|
user_models.json |
Model registry — defines what models are available (checkpoint, recipe, etc.) |
recipe_options.json |
Per-model settings — configures how models run (context size, backend, etc.) |
See configuration.md for more information about finding the cache directory.
user_models.json ReferenceThis file contains a JSON object where each key is a model name and each value defines the model’s properties. Create this file in your cache directory if it doesn’t exist.
{
"MyCustomModel": {
"checkpoint": "org/repo-name:filename.gguf",
"recipe": "llamacpp",
"size": 3.5
}
}
| Field | Required | Type | Description |
|---|---|---|---|
checkpoint |
Yes* | String | HuggingFace checkpoint in org/repo or org/repo:variant format. Use org/repo:filename.gguf for GGUF models. |
checkpoints |
Yes* | Object | Alternative to checkpoint for models with multiple files. See Multi-file models. |
recipe |
Yes | String | Backend engine to use. One of: llamacpp, whispercpp, sd-cpp, kokoro, ryzenai-llm, flm. |
size |
No | Number | Model size in GB. Informational only — displayed in the UI and used for RAM filtering. |
mmproj |
No | String | Filename of the multimodal projector file for llamacpp vision models (must be in the same HuggingFace repo as the checkpoint). This is a top-level field, not inside checkpoints. |
image_defaults |
No | Object | Default image generation parameters for sd-cpp models. See Image defaults. |
* Either checkpoint or checkpoints is required, but not both.
The checkpoint field uses the format org/repo:variant:
org/repo:filename.gguf — e.g., Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguforg/repo:QUANT — e.g., unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M. The server will search the repo for a matching .gguf file.org/repo — e.g., amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnxorg/repo:filename.safetensors — e.g., stabilityai/sd-turbo:sd_turbo.safetensorsFor models that require multiple files (e.g., Whisper models with NPU cache, or Flux image models with separate VAE/text encoder), use checkpoints instead of checkpoint:
{
"My-Whisper-Model": {
"checkpoints": {
"main": "ggerganov/whisper.cpp:ggml-tiny.bin",
"npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
},
"recipe": "whispercpp",
"size": 0.075
}
}
Supported checkpoint keys:
| Key | Used by | Description |
|---|---|---|
main |
All | Primary model file |
npu_cache |
whispercpp | NPU-accelerated encoder cache |
text_encoder |
sd-cpp | Text encoder for image generation models |
vae |
sd-cpp | VAE for image generation models |
For sd-cpp recipe models, you can specify default image generation parameters:
{
"My-SD-Model": {
"checkpoint": "org/repo:model.safetensors",
"recipe": "sd-cpp",
"size": 5.2,
"image_defaults": {
"steps": 20,
"cfg_scale": 7.0,
"width": 512,
"height": 512
}
}
}
user_models.json, store model names without the user. prefix (e.g., MyCustomModel).recipe_options.json, use the full prefixed name (e.g., user.MyCustomModel).custom are added automatically. Additional labels (reasoning, vision, embeddings, reranking) can be set via the pull CLI/API flags, or by including a labels array in the JSON entry.recipe_options.json ReferenceThis file configures per-model runtime settings. Each key is a full model name (including prefix like user. or extra.) and each value contains the settings for that model.
{
"user.MyCustomModel": {
"ctx_size": 4096,
"llamacpp_backend": "vulkan",
"llamacpp_args": ""
}
}
Note: Per-model options can also be configured through the Lemonade desktop app’s model settings, or via the
save_optionsparameter in the/api/v1/loadendpoint.
user_models.json:
{
"Qwen2.5-Coder-1.5B-Instruct": {
"checkpoint": "Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf",
"recipe": "llamacpp",
"size": 1.0
}
}
recipe_options.json:
{
"user.Qwen2.5-Coder-1.5B-Instruct": {
"ctx_size": 16384,
"llamacpp_backend": "vulkan"
}
}
Then load the model:
lemonade run user.Qwen2.5-Coder-1.5B-Instruct
user_models.json:
{
"My-Vision-Model": {
"checkpoint": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
"mmproj": "mmproj-model-f16.gguf",
"recipe": "llamacpp",
"size": 3.61
}
}
user_models.json:
{
"My-Embedding-Model": {
"checkpoint": "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S",
"recipe": "llamacpp",
"size": 0.08
}
}
The model will automatically be available as user.My-Embedding-Model. To mark it as an embedding model, use the manual registration flags on pull:
lemonade pull user.My-Embedding-Model \
--checkpoint main "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S" \
--recipe llamacpp \
--label embeddings
Or just lemonade pull nomic-ai/nomic-embed-text-v1-GGUF — the embeddings label is auto-applied because the repo id contains embed.
When loading a model, settings are resolved in this order (highest to lowest priority):
/api/v1/load requestrecipe_options.jsonFor full details, see the load endpoint documentation.
/api/v1/pull endpoint — register and download models via API