Skip to content

Custom Model Configuration

This guide explains how to manually register custom models in Lemonade Server using the JSON configuration files. This is useful for adding any HuggingFace model that isn't in the built-in model list.

Tip: For most Hugging Face GGUFs, the easiest way to add a custom model is just:

lemonade pull org/repo
Lemonade fetches the repo, lists the available quantizations (and any sharded folder variants), auto-detects mmproj-*.gguf files for vision models, infers labels (vision/embeddings/reranking) from the repo id, and presents an interactive variant menu. To skip the menu, append :VARIANT:
lemonade pull org/repo:Q4_K_M
The desktop app's "Search Hugging Face" panel calls the same /api/v1/pull/variants endpoint under the hood.

If you need full control — multiple checkpoints (main + mmproj + vae + ...), a non-llamacpp recipe, or custom labels — use the advanced flags on lemonade pull:

lemonade pull user.MyModel --checkpoint main "org/repo:file.gguf" --recipe llamacpp
This guide covers the underlying JSON files for users who need manual control beyond what the CLI exposes.

Overview

Custom model configuration involves two files, both located in the Lemonade cache directory:

File Purpose
user_models.json Model registry — defines what models are available (checkpoint, recipe, etc.)
recipe_options.json Per-model settings — configures how models run (context size, backend, etc.)

See configuration.md for more information about finding the cache directory.

user_models.json Reference

This file contains a JSON object where each key is a model name and each value defines the model's properties. Create this file in your cache directory if it doesn't exist.

Template

{
    "MyCustomModel": {
        "checkpoint": "org/repo-name:filename.gguf",
        "recipe": "llamacpp",
        "size": 3.5
    }
}

Fields

Field Required Type Description
checkpoint Yes* String HuggingFace checkpoint in org/repo or org/repo:variant format. Use org/repo:filename.gguf for GGUF models.
checkpoints Yes* Object Alternative to checkpoint for models with multiple files. See Multi-file models.
recipe Yes String Backend engine to use. One of: llamacpp, whispercpp, sd-cpp, kokoro, ryzenai-llm, flm.
size No Number Model size in GB. Informational only — displayed in the UI and used for RAM filtering.
mmproj No String Filename of the multimodal projector file for llamacpp vision models (must be in the same HuggingFace repo as the checkpoint). This is a top-level field, not inside checkpoints.
image_defaults No Object Default image generation parameters for sd-cpp models. See Image defaults.

* Either checkpoint or checkpoints is required, but not both.

Checkpoint format

The checkpoint field uses the format org/repo:variant:

  • GGUF models (exact filename): org/repo:filename.gguf — e.g., Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf
  • GGUF models (quantization shorthand): org/repo:QUANT — e.g., unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M. The server will search the repo for a matching .gguf file.
  • ONNX models: org/repo — e.g., amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx
  • Safetensor models: org/repo:filename.safetensors — e.g., stabilityai/sd-turbo:sd_turbo.safetensors

Multi-file models

For models that require multiple files (e.g., Whisper models with NPU cache, or Flux image models with separate VAE/text encoder), use checkpoints instead of checkpoint:

{
    "My-Whisper-Model": {
        "checkpoints": {
            "main": "ggerganov/whisper.cpp:ggml-tiny.bin",
            "npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
        },
        "recipe": "whispercpp",
        "size": 0.075
    }
}

Supported checkpoint keys:

Key Used by Description
main All Primary model file
npu_cache whispercpp NPU-accelerated encoder cache
text_encoder sd-cpp Text encoder for image generation models
vae sd-cpp VAE for image generation models

Image defaults

For sd-cpp recipe models, you can specify default image generation parameters:

{
    "My-SD-Model": {
        "checkpoint": "org/repo:model.safetensors",
        "recipe": "sd-cpp",
        "size": 5.2,
        "image_defaults": {
            "steps": 20,
            "cfg_scale": 7.0,
            "width": 512,
            "height": 512
        }
    }
}

Model naming

  • In user_models.json, store model names without the user. prefix (e.g., MyCustomModel).
  • When referencing the model in API calls, CLI commands, or recipe_options.json, use the full prefixed name (e.g., user.MyCustomModel).
  • Labels like custom are added automatically. Additional labels (reasoning, vision, embeddings, reranking) can be set via the pull CLI/API flags, or by including a labels array in the JSON entry.

recipe_options.json Reference

This file configures per-model runtime settings. Each key is a full model name (including prefix like user. or extra.) and each value contains the settings for that model.

Template

{
    "user.MyCustomModel": {
        "ctx_size": 4096,
        "llamacpp_backend": "vulkan",
        "llamacpp_args": ""
    }
}

Note: Per-model options can also be configured through the Lemonade desktop app's model settings, or via the save_options parameter in the /api/v1/load endpoint.

Complete Examples

Example 1: Adding a GGUF LLM with large context

user_models.json:

{
    "Qwen2.5-Coder-1.5B-Instruct": {
        "checkpoint": "Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf",
        "recipe": "llamacpp",
        "size": 1.0
    }
}

recipe_options.json:

{
    "user.Qwen2.5-Coder-1.5B-Instruct": {
        "ctx_size": 16384,
        "llamacpp_backend": "vulkan"
    }
}

Then load the model:

lemonade run user.Qwen2.5-Coder-1.5B-Instruct

Example 2: Adding a vision model with mmproj

user_models.json:

{
    "My-Vision-Model": {
        "checkpoint": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
        "mmproj": "mmproj-model-f16.gguf",
        "recipe": "llamacpp",
        "size": 3.61
    }
}

Example 3: Adding an embedding model

user_models.json:

{
    "My-Embedding-Model": {
        "checkpoint": "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S",
        "recipe": "llamacpp",
        "size": 0.08
    }
}

The model will automatically be available as user.My-Embedding-Model. To mark it as an embedding model, use the manual registration flags on pull:

lemonade pull user.My-Embedding-Model \
    --checkpoint main "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S" \
    --recipe llamacpp \
    --label embeddings
Or just lemonade pull nomic-ai/nomic-embed-text-v1-GGUF — the embeddings label is auto-applied because the repo id contains embed.

Settings Priority

When loading a model, settings are resolved in this order (highest to lowest priority):

  1. Values explicitly passed in the /api/v1/load request
  2. Per-model values from recipe_options.json
  3. Global configuration values, see Server Configuration

For full details, see the load endpoint documentation.

See Also