lemonade

🍋 Lemonade Server Models

This document provides the models we recommend for use with Lemonade Server.

Click on any model to learn more details about it, such as the Lemonade Recipe used to load the model. Content:

Model Management GUI
Supported Models
Naming Convention
Model Storage and Management
Installing Additional Models

Model Management GUI

Lemonade Server offers a model management GUI to help you see which models are available, install new models, and delete models. You can access this GUI by starting Lemonade Server, opening http://localhost:8000 in your web browser, and clicking the Model Management tab.

Supported Models

🔥 Hot Models

Qwen3-30B-A3B-Instruct-2507-GGUF

```bash lemonade-server pull Qwen3-30B-A3B-Instruct-2507-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
GGUF Variant	Qwen3-30B-A3B-Instruct-2507-Q4_0.gguf
Recipe	llamacpp
Labels	hot

Qwen3-Coder-30B-A3B-Instruct-GGUF

```bash lemonade-server pull Qwen3-Coder-30B-A3B-Instruct-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
GGUF Variant	Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
Recipe	llamacpp
Labels	coding, hot

gpt-oss-120b-GGUF

```bash lemonade-server pull gpt-oss-120b-GGUF ```

Key	Value
Checkpoint	unsloth/gpt-oss-120b-GGUF
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	hot, reasoning

gpt-oss-20b-GGUF

```bash lemonade-server pull gpt-oss-20b-GGUF ```

Key	Value
Checkpoint	unsloth/gpt-oss-20b-GGUF
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	hot, reasoning

GLM-4.5-Air-UD-Q4K-XL-GGUF

```bash lemonade-server pull GLM-4.5-Air-UD-Q4K-XL-GGUF ```

Key	Value
Checkpoint	unsloth/GLM-4.5-Air-GGUF
GGUF Variant	UD-Q4_K_XL
Recipe	llamacpp
Labels	reasoning, hot

GGUF

Qwen3-0.6B-GGUF

```bash lemonade-server pull Qwen3-0.6B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-0.6B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning

Qwen3-1.7B-GGUF

```bash lemonade-server pull Qwen3-1.7B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-1.7B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning

Qwen3-4B-GGUF

```bash lemonade-server pull Qwen3-4B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-4B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning

Qwen3-8B-GGUF

```bash lemonade-server pull Qwen3-8B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-8B-GGUF
GGUF Variant	Q4_1
Recipe	llamacpp
Labels	reasoning

DeepSeek-Qwen3-8B-GGUF

```bash lemonade-server pull DeepSeek-Qwen3-8B-GGUF ```

Key	Value
Checkpoint	unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
GGUF Variant	Q4_1
Recipe	llamacpp
Labels	reasoning

Qwen3-14B-GGUF

```bash lemonade-server pull Qwen3-14B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-14B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning

Qwen3-30B-A3B-GGUF

```bash lemonade-server pull Qwen3-30B-A3B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-30B-A3B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning

Qwen3-30B-A3B-Instruct-2507-GGUF

```bash lemonade-server pull Qwen3-30B-A3B-Instruct-2507-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
GGUF Variant	Qwen3-30B-A3B-Instruct-2507-Q4_0.gguf
Recipe	llamacpp
Labels	hot

Qwen3-Coder-30B-A3B-Instruct-GGUF

```bash lemonade-server pull Qwen3-Coder-30B-A3B-Instruct-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
GGUF Variant	Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
Recipe	llamacpp
Labels	coding, hot

Gemma-3-4b-it-GGUF

```bash lemonade-server pull Gemma-3-4b-it-GGUF ```

Key	Value
Checkpoint	ggml-org/gemma-3-4b-it-GGUF
GGUF Variant	Q4_K_M
Mmproj	mmproj-model-f16.gguf
Recipe	llamacpp
Labels	vision

Qwen2.5-VL-7B-Instruct-GGUF

```bash lemonade-server pull Qwen2.5-VL-7B-Instruct-GGUF ```

Key	Value
Checkpoint	ggml-org/Qwen2.5-VL-7B-Instruct-GGUF
GGUF Variant	Q4_K_M
Mmproj	mmproj-Qwen2.5-VL-7B-Instruct-f16.gguf
Recipe	llamacpp
Labels	vision

Llama-4-Scout-17B-16E-Instruct-GGUF

```bash lemonade-server pull Llama-4-Scout-17B-16E-Instruct-GGUF ```

Key	Value
Checkpoint	unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
GGUF Variant	Q4_K_S
Mmproj	mmproj-F16.gguf
Recipe	llamacpp
Labels	vision

nomic-embed-text-v1-GGUF

```bash lemonade-server pull nomic-embed-text-v1-GGUF ```

Key	Value
Checkpoint	nomic-ai/nomic-embed-text-v1-GGUF
GGUF Variant	Q4_K_S
Recipe	llamacpp
Labels	embeddings

nomic-embed-text-v2-moe-GGUF

```bash lemonade-server pull nomic-embed-text-v2-moe-GGUF ```

Key	Value
Checkpoint	nomic-ai/nomic-embed-text-v2-moe-GGUF
GGUF Variant	Q8_0
Recipe	llamacpp
Labels	embeddings

bge-reranker-v2-m3-GGUF

```bash lemonade-server pull bge-reranker-v2-m3-GGUF ```

Key	Value
Checkpoint	pqnet/bge-reranker-v2-m3-Q8_0-GGUF
Recipe	llamacpp
Labels	reranking

Devstral-Small-2507-GGUF

```bash lemonade-server pull Devstral-Small-2507-GGUF ```

Key	Value
Checkpoint	mistralai/Devstral-Small-2507_gguf
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	coding

Qwen2.5-Coder-32B-Instruct-GGUF

```bash lemonade-server pull Qwen2.5-Coder-32B-Instruct-GGUF ```

Key	Value
Checkpoint	Qwen/Qwen2.5-Coder-32B-Instruct-GGUF
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	coding

gpt-oss-120b-GGUF

```bash lemonade-server pull gpt-oss-120b-GGUF ```

Key	Value
Checkpoint	unsloth/gpt-oss-120b-GGUF
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	hot, reasoning

gpt-oss-20b-GGUF

```bash lemonade-server pull gpt-oss-20b-GGUF ```

Key	Value
Checkpoint	unsloth/gpt-oss-20b-GGUF
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	hot, reasoning

GLM-4.5-Air-UD-Q4K-XL-GGUF

```bash lemonade-server pull GLM-4.5-Air-UD-Q4K-XL-GGUF ```

Key	Value
Checkpoint	unsloth/GLM-4.5-Air-GGUF
GGUF Variant	UD-Q4_K_XL
Recipe	llamacpp
Labels	reasoning, hot

Hybrid

Llama-3.2-1B-Instruct-Hybrid

```bash lemonade-server pull Llama-3.2-1B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid

Llama-3.2-3B-Instruct-Hybrid

```bash lemonade-server pull Llama-3.2-3B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid

Phi-3-Mini-Instruct-Hybrid

```bash lemonade-server pull Phi-3-Mini-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid

Qwen-1.5-7B-Chat-Hybrid

```bash lemonade-server pull Qwen-1.5-7B-Chat-Hybrid ```

Key	Value
Checkpoint	amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid

Qwen-2.5-7B-Instruct-Hybrid

```bash lemonade-server pull Qwen-2.5-7B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Qwen2.5-7B-Instruct-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
Recipe	oga-hybrid

Qwen-2.5-3B-Instruct-Hybrid

```bash lemonade-server pull Qwen-2.5-3B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Qwen2.5-3B-Instruct-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
Recipe	oga-hybrid

Qwen-2.5-1.5B-Instruct-Hybrid

```bash lemonade-server pull Qwen-2.5-1.5B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Qwen2.5-1.5B-Instruct-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
Recipe	oga-hybrid

DeepSeek-R1-Distill-Llama-8B-Hybrid

```bash lemonade-server pull DeepSeek-R1-Distill-Llama-8B-Hybrid ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-hybrid
Recipe	oga-hybrid
Labels	reasoning

Mistral-7B-v0.3-Instruct-Hybrid

```bash lemonade-server pull Mistral-7B-v0.3-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid

Llama-3.1-8B-Instruct-Hybrid

```bash lemonade-server pull Llama-3.1-8B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Llama-3.1-8B-Instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid
Recipe	oga-hybrid

Llama-xLAM-2-8b-fc-r-Hybrid

```bash lemonade-server pull Llama-xLAM-2-8b-fc-r-Hybrid ```

Key	Value
Checkpoint	amd/Llama-xLAM-2-8b-fc-r-awq-g128-int4-asym-bfp16-onnx-hybrid
Recipe	oga-hybrid

NPU

Qwen-2.5-7B-Instruct-NPU

```bash lemonade-server pull Qwen-2.5-7B-Instruct-NPU ```

Key	Value
Checkpoint	amd/Qwen2.5-7B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu

Qwen-2.5-1.5B-Instruct-NPU

```bash lemonade-server pull Qwen-2.5-1.5B-Instruct-NPU ```

Key	Value
Checkpoint	amd/Qwen2.5-1.5B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu

DeepSeek-R1-Distill-Llama-8B-NPU

```bash lemonade-server pull DeepSeek-R1-Distill-Llama-8B-NPU ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu

Mistral-7B-v0.3-Instruct-NPU

```bash lemonade-server pull Mistral-7B-v0.3-Instruct-NPU ```

Key	Value
Checkpoint	amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu

Phi-3.5-Mini-Instruct-NPU

```bash lemonade-server pull Phi-3.5-Mini-Instruct-NPU ```

Key	Value
Checkpoint	amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu

CPU

Qwen2.5-0.5B-Instruct-CPU

```bash lemonade-server pull Qwen2.5-0.5B-Instruct-CPU ```

Key	Value
Checkpoint	amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx
Recipe	oga-cpu

Phi-3-Mini-Instruct-CPU

```bash lemonade-server pull Phi-3-Mini-Instruct-CPU ```

Key	Value
Checkpoint	amd/Phi-3-mini-4k-instruct_int4_float16_onnx_cpu
Recipe	oga-cpu

Qwen-1.5-7B-Chat-CPU

```bash lemonade-server pull Qwen-1.5-7B-Chat-CPU ```

Key	Value
Checkpoint	amd/Qwen1.5-7B-Chat_uint4_asym_g128_float16_onnx_cpu
Recipe	oga-cpu

DeepSeek-R1-Distill-Llama-8B-CPU

```bash lemonade-server pull DeepSeek-R1-Distill-Llama-8B-CPU ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu
Recipe	oga-cpu
Labels	reasoning

DeepSeek-R1-Distill-Qwen-7B-CPU

```bash lemonade-server pull DeepSeek-R1-Distill-Qwen-7B-CPU ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu
Recipe	oga-cpu
Labels	reasoning

Naming Convention

The format of each Lemonade name is a combination of the name in the base checkpoint and the backend where the model will run. So, if the base checkpoint is meta-llama/Llama-3.2-1B-Instruct, and it has been optimized to run on Hybrid, the resulting name is Llama-3.2-3B-Instruct-Hybrid.

Model Storage and Management

Lemonade Server relies on Hugging Face Hub to manage downloading and storing models on your system. By default, Hugging Face Hub downloads models to C:\Users\YOUR_USERNAME\.cache\huggingface\hub.

For example, the Lemonade Server Llama-3.2-3B-Instruct-Hybrid model will end up at C:\Users\YOUR_USERNAME\.cache\huggingface\hub\models--amd--Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid. If you want to uninstall that model, simply delete that folder.

You can change the directory for Hugging Face Hub by setting the HF_HOME or HF_HUB_CACHE environment variables.

Installing Additional Models

Once you’ve installed Lemonade Server, you can install any model on this list using the pull command in the lemonade-server CLI.

Example:

lemonade-server pull Qwen2.5-0.5B-Instruct-CPU

Note: lemonade-server is a utility that is added to your PATH when you install Lemonade Server with the GUI installer. If you are using Lemonade Server from a Python environment, use the lemonade-server-dev pull command instead.

This site is open source. Improve this page.