Custom Model Configuration

This guide explains how to manually register custom models in Lemonade Server using the JSON configuration files. This is useful for adding any HuggingFace model that isn't in the built-in model list.

Tip: For most Hugging Face GGUFs, the easiest way to add a custom model is just:
lemonade pull org/repo
Lemonade fetches the repo, lists the available quantizations (and any sharded folder variants), auto-detects mmproj-*.gguf files for vision models, infers labels (vision/embeddings/reranking) from the repo id, and presents an interactive variant menu. To skip the menu, append :VARIANT:
lemonade pull org/repo:Q4_K_M
The desktop app's "Search Hugging Face" panel calls the same /api/v1/pull/variants endpoint under the hood.

If you need full control — multiple checkpoints (main + mmproj + vae + ...), a non-llamacpp recipe, or custom labels — use the advanced flags on lemonade pull:
lemonade pull user.MyModel --checkpoint main "org/repo:file.gguf" --recipe llamacpp
This guide covers the underlying JSON files for users who need manual control beyond what the CLI exposes.

Overview

Custom model configuration involves two files, both located in the Lemonade cache directory:

File	Purpose
`user_models.json`	Model registry — defines what models are available (checkpoint, recipe, etc.)
`recipe_options.json`	Per-model settings — configures how models run (context size, backend, etc.)

See configuration.md for more information about finding the cache directory.

`user_models.json` Reference

This file contains a JSON object where each key is a model name and each value defines the model's properties. Create this file in your cache directory if it doesn't exist.

Template

{
    "MyCustomModel": {
        "checkpoint": "org/repo-name:filename.gguf",
        "recipe": "llamacpp",
        "size": 3.5
    }
}

Fields

Field	Required	Type	Description
`checkpoint`	Yes*	String	HuggingFace checkpoint in `org/repo` or `org/repo:variant` format. Use `org/repo:filename.gguf` for GGUF models.
`checkpoints`	Yes*	Object	Alternative to `checkpoint` for models with multiple files. See Multi-file models.
`recipe`	Yes	String	Backend engine to use. One of: `llamacpp`, `whispercpp`, `sd-cpp`, `kokoro`, `ryzenai-llm`, `flm`.
`size`	No	Number	Model size in GB. Informational only — displayed in the UI and used for RAM filtering.
`mmproj`	No	String	Filename of the multimodal projector file for llamacpp vision models (must be in the same HuggingFace repo as the checkpoint). This is a top-level field, not inside `checkpoints`.
`image_defaults`	No	Object	Default image generation parameters for `sd-cpp` models. See Image defaults.

* Either checkpoint or checkpoints is required, but not both.

Checkpoint format

The checkpoint field uses the format org/repo:variant:

GGUF models (exact filename): org/repo:filename.gguf — e.g., Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf
GGUF models (quantization shorthand): org/repo:QUANT — e.g., unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M. The server will search the repo for a matching .gguf file.
ONNX models: org/repo — e.g., amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx
Safetensor models: org/repo:filename.safetensors — e.g., stabilityai/sd-turbo:sd_turbo.safetensors

Multi-file models

For models that require multiple files (e.g., Whisper models with NPU cache, or Flux image models with separate VAE/text encoder), use checkpoints instead of checkpoint:

{
    "My-Whisper-Model": {
        "checkpoints": {
            "main": "ggerganov/whisper.cpp:ggml-tiny.bin",
            "npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
        },
        "recipe": "whispercpp",
        "size": 0.075
    }
}

Supported checkpoint keys:

Key	Used by	Description
`main`	All	Primary model file
`npu_cache`	whispercpp	NPU-accelerated encoder cache
`text_encoder`	sd-cpp	Text encoder for image generation models
`vae`	sd-cpp	VAE for image generation models

Image defaults

For sd-cpp recipe models, you can specify default image generation parameters:

{
    "My-SD-Model": {
        "checkpoint": "org/repo:model.safetensors",
        "recipe": "sd-cpp",
        "size": 5.2,
        "image_defaults": {
            "steps": 20,
            "cfg_scale": 7.0,
            "width": 512,
            "height": 512
        }
    }
}

Model naming

In user_models.json, store model names without the user. prefix (e.g., MyCustomModel).
When referencing the model in API calls, CLI commands, or recipe_options.json, use the full prefixed name (e.g., user.MyCustomModel).
Labels like custom are added automatically. Additional labels (reasoning, vision, embeddings, reranking) can be set via the pull CLI/API flags, or by including a labels array in the JSON entry.

`recipe_options.json` Reference

This file configures per-model runtime settings. Each key is a full model name (including prefix like user. or extra.) and each value contains the settings for that model.

Template

{
    "user.MyCustomModel": {
        "ctx_size": 4096,
        "llamacpp_backend": "vulkan",
        "llamacpp_args": ""
    }
}

Note: Per-model options can also be configured through the Lemonade desktop app's model settings, or via the save_options parameter in the /api/v1/load endpoint.

Complete Examples

Example 1: Adding a GGUF LLM with large context

user_models.json:

{
    "Qwen2.5-Coder-1.5B-Instruct": {
        "checkpoint": "Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf",
        "recipe": "llamacpp",
        "size": 1.0
    }
}

recipe_options.json:

{
    "user.Qwen2.5-Coder-1.5B-Instruct": {
        "ctx_size": 16384,
        "llamacpp_backend": "vulkan"
    }
}

Then load the model:

lemonade run user.Qwen2.5-Coder-1.5B-Instruct

Example 2: Adding a vision model with mmproj

user_models.json:

{
    "My-Vision-Model": {
        "checkpoint": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
        "mmproj": "mmproj-model-f16.gguf",
        "recipe": "llamacpp",
        "size": 3.61
    }
}

Example 3: Adding an embedding model

user_models.json:

{
    "My-Embedding-Model": {
        "checkpoint": "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S",
        "recipe": "llamacpp",
        "size": 0.08
    }
}

The model will automatically be available as user.My-Embedding-Model. To mark it as an embedding model, use the manual registration flags on pull:

lemonade pull user.My-Embedding-Model \
    --checkpoint main "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S" \
    --recipe llamacpp \
    --label embeddings

Or just lemonade pull nomic-ai/nomic-embed-text-v1-GGUF — the embeddings label is auto-applied because the repo id contains embed.

Settings Priority

When loading a model, settings are resolved in this order (highest to lowest priority):

Values explicitly passed in the /api/v1/load request
Per-model values from recipe_options.json
Global configuration values, see Server Configuration

For full details, see the load endpoint documentation.

Custom Model Configuration

Overview

user_models.json Reference

Template

Fields

Checkpoint format

Multi-file models

Image defaults

Model naming

recipe_options.json Reference

Template

Complete Examples

Example 1: Adding a GGUF LLM with large context

Example 2: Adding a vision model with mmproj

Example 3: Adding an embedding model

Settings Priority

See Also

`user_models.json` Reference

`recipe_options.json` Reference