Custom Model Configuration

This guide explains how to manually register custom models in Lemonade Server using the JSON configuration files. This is useful for adding any HuggingFace model that isn't in the built-in model list.

Tip: The easiest way to add a custom model is using the lemonade-server pull CLI command or the /api/v1/pull endpoint, which automate the registration and download process. For example:
lemonade-server pull user.MyModel --checkpoint "org/repo:file.gguf" --recipe llamacpp
This guide covers the underlying JSON files for users who need manual control.

Overview

Custom model configuration involves two files, both located in the Lemonade cache directory:

File	Purpose
`user_models.json`	Model registry — defines what models are available (checkpoint, recipe, etc.)
`recipe_options.json`	Per-model settings — configures how models run (context size, backend, etc.)

Cache directory location:

OS	Default Path
Windows	`%USERPROFILE%\.cache\lemonade\`
Linux	`~/.cache/lemonade/`
macOS	`~/.cache/lemonade/`

The cache directory can be overridden by setting the LEMONADE_CACHE_DIR environment variable.

`user_models.json` Reference

This file contains a JSON object where each key is a model name and each value defines the model's properties. Create this file in your cache directory if it doesn't exist.

Template

{
    "MyCustomModel": {
        "checkpoint": "org/repo-name:filename.gguf",
        "recipe": "llamacpp",
        "size": 3.5
    }
}

Fields

Field	Required	Type	Description
`checkpoint`	Yes*	String	HuggingFace checkpoint in `org/repo` or `org/repo:variant` format. Use `org/repo:filename.gguf` for GGUF models.
`checkpoints`	Yes*	Object	Alternative to `checkpoint` for models with multiple files. See Multi-file models.
`recipe`	Yes	String	Backend engine to use. One of: `llamacpp`, `whispercpp`, `sd-cpp`, `kokoro`, `ryzenai-llm`, `flm`.
`size`	No	Number	Model size in GB. Informational only — displayed in the UI and used for RAM filtering.
`mmproj`	No	String	Filename of the multimodal projector file for llamacpp vision models (must be in the same HuggingFace repo as the checkpoint). This is a top-level field, not inside `checkpoints`.
`image_defaults`	No	Object	Default image generation parameters for `sd-cpp` models. See Image defaults.

* Either checkpoint or checkpoints is required, but not both.

Checkpoint format

The checkpoint field uses the format org/repo:variant:

GGUF models (exact filename): org/repo:filename.gguf — e.g., Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf
GGUF models (quantization shorthand): org/repo:QUANT — e.g., unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M. The server will search the repo for a matching .gguf file.
ONNX models: org/repo — e.g., amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx
Safetensor models: org/repo:filename.safetensors — e.g., stabilityai/sd-turbo:sd_turbo.safetensors

Multi-file models

For models that require multiple files (e.g., Whisper models with NPU cache, or Flux image models with separate VAE/text encoder), use checkpoints instead of checkpoint:

{
    "My-Whisper-Model": {
        "checkpoints": {
            "main": "ggerganov/whisper.cpp:ggml-tiny.bin",
            "npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
        },
        "recipe": "whispercpp",
        "size": 0.075
    }
}

Supported checkpoint keys:

Key	Used by	Description
`main`	All	Primary model file
`npu_cache`	whispercpp	NPU-accelerated encoder cache
`text_encoder`	sd-cpp	Text encoder for image generation models
`vae`	sd-cpp	VAE for image generation models

Image defaults

For sd-cpp recipe models, you can specify default image generation parameters:

{
    "My-SD-Model": {
        "checkpoint": "org/repo:model.safetensors",
        "recipe": "sd-cpp",
        "size": 5.2,
        "image_defaults": {
            "steps": 20,
            "cfg_scale": 7.0,
            "width": 512,
            "height": 512
        }
    }
}

Model naming

In user_models.json, store model names without the user. prefix (e.g., MyCustomModel).
When referencing the model in API calls, CLI commands, or recipe_options.json, use the full prefixed name (e.g., user.MyCustomModel).
Labels like custom are added automatically. Additional labels (reasoning, vision, embeddings, reranking) can be set via the pull CLI/API flags, or by including a labels array in the JSON entry.

`recipe_options.json` Reference

This file configures per-model runtime settings. Each key is a full model name (including prefix like user. or extra.) and each value contains the settings for that model.

Template

{
    "user.MyCustomModel": {
        "ctx_size": 4096,
        "llamacpp_backend": "vulkan",
        "llamacpp_args": ""
    }
}

Options by recipe

llamacpp

Option	Default	Env Variable	Description
`ctx_size`	4096	`LEMONADE_CTX_SIZE`	Context window size in tokens
`llamacpp_backend`	vulkan (Windows/Linux), metal (macOS)	`LEMONADE_LLAMACPP`	Inference backend: `vulkan`, `rocm`, `cpu`, `metal`
`llamacpp_args`	(empty)	`LEMONADE_LLAMACPP_ARGS`	Extra arguments passed to llama-server

whispercpp

Option	Default	Env Variable	Description
`whispercpp_backend`	npu	`LEMONADE_WHISPERCPP`	Backend: `npu`, `cpu`, `vulkan`

sd-cpp

Option	Default	Env Variable	Description
`sd-cpp_backend`	cpu	`LEMONADE_SDCPP`	Backend: `cpu`, `rocm`
`steps`	20	`LEMONADE_STEPS`	Number of inference steps
`cfg_scale`	7.0	`LEMONADE_CFG_SCALE`	Classifier-free guidance scale
`width`	512	`LEMONADE_WIDTH`	Image width in pixels
`height`	512	`LEMONADE_HEIGHT`	Image height in pixels

flm

Option	Default	Env Variable	Description
`ctx_size`	4096	`LEMONADE_CTX_SIZE`	Context window size in tokens
`flm_args`	(empty)	`LEMONADE_FLM_ARGS`	Extra arguments passed to `flm serve`

ryzenai-llm

Option	Default	Env Variable	Description
`ctx_size`	4096	`LEMONADE_CTX_SIZE`	Context window size in tokens

kokoro

The kokoro recipe (text-to-speech) has no configurable options in recipe_options.json.

Note: Per-model options can also be configured through the Lemonade desktop app's model settings, or via the save_options parameter in the /api/v1/load endpoint.

Complete Examples

Example 1: Adding a GGUF LLM with large context

user_models.json:

{
    "Qwen2.5-Coder-1.5B-Instruct": {
        "checkpoint": "Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf",
        "recipe": "llamacpp",
        "size": 1.0
    }
}

recipe_options.json:

{
    "user.Qwen2.5-Coder-1.5B-Instruct": {
        "ctx_size": 16384,
        "llamacpp_backend": "vulkan"
    }
}

Then load the model:

lemonade-server run user.Qwen2.5-Coder-1.5B-Instruct

Example 2: Adding a vision model with mmproj

user_models.json:

{
    "My-Vision-Model": {
        "checkpoint": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
        "mmproj": "mmproj-model-f16.gguf",
        "recipe": "llamacpp",
        "size": 3.61
    }
}

Example 3: Adding an embedding model

user_models.json:

{
    "My-Embedding-Model": {
        "checkpoint": "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S",
        "recipe": "llamacpp",
        "size": 0.08
    }
}

The model will automatically be available as user.My-Embedding-Model. To mark it as an embedding model, use the pull CLI instead:

lemonade-server pull user.My-Embedding-Model \
    --checkpoint "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S" \
    --recipe llamacpp \
    --embedding

Settings Priority

When loading a model, settings are resolved in this order (highest to lowest priority):

Values explicitly passed in the /api/v1/load request
Per-model values from recipe_options.json
Values set via lemonade-server CLI arguments or environment variables
Default hardcoded values in lemonade-router

For full details, see the load endpoint documentation.

Custom Model Configuration

Overview

user_models.json Reference

Template

Fields

Checkpoint format

Multi-file models

Image defaults

Model naming

recipe_options.json Reference

Template

Options by recipe

llamacpp

whispercpp

sd-cpp

flm

ryzenai-llm

kokoro

Complete Examples

Example 1: Adding a GGUF LLM with large context

Example 2: Adding a vision model with mmproj

Example 3: Adding an embedding model

Settings Priority

See Also

`user_models.json` Reference

`recipe_options.json` Reference