Custom Model Configuration
This guide explains how to manually register custom models in Lemonade Server using the JSON configuration files. This is useful for adding any HuggingFace model that isn't in the built-in model list.
Tip: For most Hugging Face GGUFs, the easiest way to add a custom model is just:
Lemonade fetches the repo, lists the available quantizations (and any sharded folder variants), auto-detectslemonade pull org/repommproj-*.gguffiles for vision models, infers labels (vision/embeddings/reranking) from the repo id, and presents an interactive variant menu. To skip the menu, append:VARIANT:The desktop app's "Search Hugging Face" panel calls the samelemonade pull org/repo:Q4_K_M/api/v1/pull/variantsendpoint under the hood.If you need full control — multiple checkpoints (
main+mmproj+vae+ ...), a non-llamacpp recipe, or custom labels — use the advanced flags onlemonade pull:This guide covers the underlying JSON files for users who need manual control beyond what the CLI exposes.lemonade pull user.MyModel --checkpoint main "org/repo:file.gguf" --recipe llamacpp
Overview
Custom model configuration involves two files, both located in the Lemonade cache directory:
| File | Purpose |
|---|---|
user_models.json |
Model registry — defines what models are available (checkpoint, recipe, etc.) |
recipe_options.json |
Per-model settings — configures how models run (context size, backend, etc.) |
See configuration.md for more information about finding the cache directory.
user_models.json Reference
This file contains a JSON object where each key is a model name and each value defines the model's properties. Create this file in your cache directory if it doesn't exist.
Template
{
"MyCustomModel": {
"checkpoint": "org/repo-name:filename.gguf",
"recipe": "llamacpp",
"size": 3.5
}
}
Fields
| Field | Required | Type | Description |
|---|---|---|---|
checkpoint |
Yes* | String | HuggingFace checkpoint in org/repo or org/repo:variant format. Use org/repo:filename.gguf for GGUF models. |
checkpoints |
Yes* | Object | Alternative to checkpoint for models with multiple files. See Multi-file models. |
recipe |
Yes | String | Backend engine to use. One of: llamacpp, whispercpp, sd-cpp, kokoro, ryzenai-llm, flm. |
size |
No | Number | Model size in GB. Informational only — displayed in the UI and used for RAM filtering. |
mmproj |
No | String | Filename of the multimodal projector file for llamacpp vision models (must be in the same HuggingFace repo as the checkpoint). This is a top-level field, not inside checkpoints. |
image_defaults |
No | Object | Default image generation parameters for sd-cpp models. See Image defaults. |
* Either checkpoint or checkpoints is required, but not both.
Checkpoint format
The checkpoint field uses the format org/repo:variant:
- GGUF models (exact filename):
org/repo:filename.gguf— e.g.,Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf - GGUF models (quantization shorthand):
org/repo:QUANT— e.g.,unsloth/Phi-4-mini-instruct-GGUF:Q4_K_M. The server will search the repo for a matching.gguffile. - ONNX models:
org/repo— e.g.,amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx - Safetensor models:
org/repo:filename.safetensors— e.g.,stabilityai/sd-turbo:sd_turbo.safetensors
Multi-file models
For models that require multiple files (e.g., Whisper models with NPU cache, or Flux image models with separate VAE/text encoder), use checkpoints instead of checkpoint:
{
"My-Whisper-Model": {
"checkpoints": {
"main": "ggerganov/whisper.cpp:ggml-tiny.bin",
"npu_cache": "amd/whisper-tiny-onnx-npu:ggml-tiny-encoder-vitisai.rai"
},
"recipe": "whispercpp",
"size": 0.075
}
}
Supported checkpoint keys:
| Key | Used by | Description |
|---|---|---|
main |
All | Primary model file |
npu_cache |
whispercpp | NPU-accelerated encoder cache |
text_encoder |
sd-cpp | Text encoder for image generation models |
vae |
sd-cpp | VAE for image generation models |
Image defaults
For sd-cpp recipe models, you can specify default image generation parameters:
{
"My-SD-Model": {
"checkpoint": "org/repo:model.safetensors",
"recipe": "sd-cpp",
"size": 5.2,
"image_defaults": {
"steps": 20,
"cfg_scale": 7.0,
"width": 512,
"height": 512
}
}
}
Model naming
- In
user_models.json, store model names without theuser.prefix (e.g.,MyCustomModel). - When referencing the model in API calls, CLI commands, or
recipe_options.json, use the full prefixed name (e.g.,user.MyCustomModel). - Labels like
customare added automatically. Additional labels (reasoning,vision,embeddings,reranking) can be set via thepullCLI/API flags, or by including alabelsarray in the JSON entry.
recipe_options.json Reference
This file configures per-model runtime settings. Each key is a full model name (including prefix like user. or extra.) and each value contains the settings for that model.
Template
{
"user.MyCustomModel": {
"ctx_size": 4096,
"llamacpp_backend": "vulkan",
"llamacpp_args": ""
}
}
Note: Per-model options can also be configured through the Lemonade desktop app's model settings, or via the
save_optionsparameter in the/api/v1/loadendpoint.
Complete Examples
Example 1: Adding a GGUF LLM with large context
user_models.json:
{
"Qwen2.5-Coder-1.5B-Instruct": {
"checkpoint": "Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF:qwen2.5-coder-1.5b-instruct-q4_k_m.gguf",
"recipe": "llamacpp",
"size": 1.0
}
}
recipe_options.json:
{
"user.Qwen2.5-Coder-1.5B-Instruct": {
"ctx_size": 16384,
"llamacpp_backend": "vulkan"
}
}
Then load the model:
lemonade run user.Qwen2.5-Coder-1.5B-Instruct
Example 2: Adding a vision model with mmproj
user_models.json:
{
"My-Vision-Model": {
"checkpoint": "ggml-org/gemma-3-4b-it-GGUF:Q4_K_M",
"mmproj": "mmproj-model-f16.gguf",
"recipe": "llamacpp",
"size": 3.61
}
}
Example 3: Adding an embedding model
user_models.json:
{
"My-Embedding-Model": {
"checkpoint": "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S",
"recipe": "llamacpp",
"size": 0.08
}
}
The model will automatically be available as user.My-Embedding-Model. To mark it as an embedding model, use the manual registration flags on pull:
lemonade pull user.My-Embedding-Model \
--checkpoint main "nomic-ai/nomic-embed-text-v1-GGUF:Q4_K_S" \
--recipe llamacpp \
--label embeddings
lemonade pull nomic-ai/nomic-embed-text-v1-GGUF — the embeddings label is auto-applied because the repo id contains embed.
Settings Priority
When loading a model, settings are resolved in this order (highest to lowest priority):
- Values explicitly passed in the
/api/v1/loadrequest - Per-model values from
recipe_options.json - Global configuration values, see Server Configuration
For full details, see the load endpoint documentation.
See Also
- CLI pull command — register and download models from the command line
/api/v1/pullendpoint — register and download models via API- Server Integration Guide — overview of model management options