lemonade

🍋 Lemonade Server Models

This document provides the models we recommend for use with Lemonade Server.

Click on any model to learn more details about it, such as the Lemonade Recipe used to load the model. Content:

Model Management GUI
Supported Models
Naming Convention
Model Storage and Management
Installing Additional Models

Model Management GUI

Lemonade Server offers a model management GUI to help you see which models are available, install new models, and delete models. You can access this GUI by starting Lemonade Server, opening http://localhost:8000 in your web browser, and clicking the Model Management tab.

Supported Models

🔥 Hot Models

Qwen3-4B-Instruct-2507-GGUF

```bash lemonade-server pull Qwen3-4B-Instruct-2507-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-4B-Instruct-2507-GGUF
GGUF Variant	Qwen3-4B-Instruct-2507-Q4_K_M.gguf
Recipe	llamacpp
Labels	hot
Size (GB)	2.5

Qwen3-Coder-30B-A3B-Instruct-GGUF

```bash lemonade-server pull Qwen3-Coder-30B-A3B-Instruct-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
GGUF Variant	Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
Recipe	llamacpp
Labels	coding, tool-calling, hot
Size (GB)	18.6

Gemma-3-4b-it-GGUF

```bash lemonade-server pull Gemma-3-4b-it-GGUF ```

Key	Value
Checkpoint	ggml-org/gemma-3-4b-it-GGUF
GGUF Variant	Q4_K_M
Mmproj	mmproj-model-f16.gguf
Recipe	llamacpp
Labels	hot, vision
Size (GB)	3.61

gpt-oss-120b-mxfp-GGUF

```bash lemonade-server pull gpt-oss-120b-mxfp-GGUF ```

Key	Value
Checkpoint	ggml-org/gpt-oss-120b-GGUF
GGUF Variant	*
Recipe	llamacpp
Labels	hot, reasoning, tool-calling
Size (GB)	63.3

gpt-oss-20b-mxfp4-GGUF

```bash lemonade-server pull gpt-oss-20b-mxfp4-GGUF ```

Key	Value
Checkpoint	ggml-org/gpt-oss-20b-GGUF
Recipe	llamacpp
Labels	hot, reasoning, tool-calling
Size (GB)	12.1

Gemma-3-4b-it-FLM

```bash lemonade-server pull Gemma-3-4b-it-FLM ```

Key	Value
Checkpoint	gemma3:4b
Recipe	flm
Labels	hot, vision
Size (GB)	5.26

Qwen3-4B-Instruct-2507-FLM

```bash lemonade-server pull Qwen3-4B-Instruct-2507-FLM ```

Key	Value
Checkpoint	qwen3-it:4b
Recipe	flm
Labels	hot
Size (GB)	3.07

GGUF

Qwen3-0.6B-GGUF

```bash lemonade-server pull Qwen3-0.6B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-0.6B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning
Size (GB)	0.38

Qwen3-1.7B-GGUF

```bash lemonade-server pull Qwen3-1.7B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-1.7B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning
Size (GB)	1.06

Qwen3-4B-GGUF

```bash lemonade-server pull Qwen3-4B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-4B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning
Size (GB)	2.38

Qwen3-8B-GGUF

```bash lemonade-server pull Qwen3-8B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-8B-GGUF
GGUF Variant	Q4_1
Recipe	llamacpp
Labels	reasoning
Size (GB)	5.25

DeepSeek-Qwen3-8B-GGUF

```bash lemonade-server pull DeepSeek-Qwen3-8B-GGUF ```

Key	Value
Checkpoint	unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
GGUF Variant	Q4_1
Recipe	llamacpp
Labels	reasoning
Size (GB)	5.25

Qwen3-14B-GGUF

```bash lemonade-server pull Qwen3-14B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-14B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning
Size (GB)	8.54

Qwen3-4B-Instruct-2507-GGUF

```bash lemonade-server pull Qwen3-4B-Instruct-2507-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-4B-Instruct-2507-GGUF
GGUF Variant	Qwen3-4B-Instruct-2507-Q4_K_M.gguf
Recipe	llamacpp
Labels	hot
Size (GB)	2.5

Qwen3-30B-A3B-GGUF

```bash lemonade-server pull Qwen3-30B-A3B-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-30B-A3B-GGUF
GGUF Variant	Q4_0
Recipe	llamacpp
Labels	reasoning
Size (GB)	17.4

Qwen3-30B-A3B-Instruct-2507-GGUF

```bash lemonade-server pull Qwen3-30B-A3B-Instruct-2507-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
GGUF Variant	Qwen3-30B-A3B-Instruct-2507-Q4_0.gguf
Recipe	llamacpp
Size (GB)	17.4

Qwen3-Coder-30B-A3B-Instruct-GGUF

```bash lemonade-server pull Qwen3-Coder-30B-A3B-Instruct-GGUF ```

Key	Value
Checkpoint	unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
GGUF Variant	Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
Recipe	llamacpp
Labels	coding, tool-calling, hot
Size (GB)	18.6

Gemma-3-4b-it-GGUF

```bash lemonade-server pull Gemma-3-4b-it-GGUF ```

Key	Value
Checkpoint	ggml-org/gemma-3-4b-it-GGUF
GGUF Variant	Q4_K_M
Mmproj	mmproj-model-f16.gguf
Recipe	llamacpp
Labels	hot, vision
Size (GB)	3.61

Qwen2.5-VL-7B-Instruct-GGUF

```bash lemonade-server pull Qwen2.5-VL-7B-Instruct-GGUF ```

Key	Value
Checkpoint	ggml-org/Qwen2.5-VL-7B-Instruct-GGUF
GGUF Variant	Q4_K_M
Mmproj	mmproj-Qwen2.5-VL-7B-Instruct-f16.gguf
Recipe	llamacpp
Labels	vision
Size (GB)	4.68

Llama-4-Scout-17B-16E-Instruct-GGUF

```bash lemonade-server pull Llama-4-Scout-17B-16E-Instruct-GGUF ```

Key	Value
Checkpoint	unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
GGUF Variant	Q4_K_S
Mmproj	mmproj-F16.gguf
Recipe	llamacpp
Labels	vision
Size (GB)	61.5

nomic-embed-text-v1-GGUF

```bash lemonade-server pull nomic-embed-text-v1-GGUF ```

Key	Value
Checkpoint	nomic-ai/nomic-embed-text-v1-GGUF
GGUF Variant	Q4_K_S
Recipe	llamacpp
Labels	embeddings
Size (GB)	0.0781

nomic-embed-text-v2-moe-GGUF

```bash lemonade-server pull nomic-embed-text-v2-moe-GGUF ```

Key	Value
Checkpoint	nomic-ai/nomic-embed-text-v2-moe-GGUF
GGUF Variant	Q8_0
Recipe	llamacpp
Labels	embeddings
Size (GB)	0.51

bge-reranker-v2-m3-GGUF

```bash lemonade-server pull bge-reranker-v2-m3-GGUF ```

Key	Value
Checkpoint	pqnet/bge-reranker-v2-m3-Q8_0-GGUF
Recipe	llamacpp
Labels	reranking
Size (GB)	0.53

Devstral-Small-2507-GGUF

```bash lemonade-server pull Devstral-Small-2507-GGUF ```

Key	Value
Checkpoint	mistralai/Devstral-Small-2507_gguf
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	coding, tool-calling
Size (GB)	14.3

Qwen2.5-Coder-32B-Instruct-GGUF

```bash lemonade-server pull Qwen2.5-Coder-32B-Instruct-GGUF ```

Key	Value
Checkpoint	Qwen/Qwen2.5-Coder-32B-Instruct-GGUF
GGUF Variant	Q4_K_M
Recipe	llamacpp
Labels	coding
Size (GB)	19.85

gpt-oss-120b-mxfp-GGUF

```bash lemonade-server pull gpt-oss-120b-mxfp-GGUF ```

Key	Value
Checkpoint	ggml-org/gpt-oss-120b-GGUF
GGUF Variant	*
Recipe	llamacpp
Labels	hot, reasoning, tool-calling
Size (GB)	63.3

gpt-oss-20b-mxfp4-GGUF

```bash lemonade-server pull gpt-oss-20b-mxfp4-GGUF ```

Key	Value
Checkpoint	ggml-org/gpt-oss-20b-GGUF
Recipe	llamacpp
Labels	hot, reasoning, tool-calling
Size (GB)	12.1

GLM-4.5-Air-UD-Q4K-XL-GGUF

```bash lemonade-server pull GLM-4.5-Air-UD-Q4K-XL-GGUF ```

Key	Value
Checkpoint	unsloth/GLM-4.5-Air-GGUF
GGUF Variant	UD-Q4_K_XL
Recipe	llamacpp
Labels	reasoning
Size (GB)	73.1

Ryzen AI Hybrid (NPU+GPU)

Llama-3.2-1B-Instruct-Hybrid

```bash lemonade-server pull Llama-3.2-1B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	1.75

Llama-3.2-3B-Instruct-Hybrid

```bash lemonade-server pull Llama-3.2-3B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Llama-3.2-3B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	3.97

Phi-3-Mini-Instruct-Hybrid

```bash lemonade-server pull Phi-3-Mini-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Phi-3-mini-4k-instruct-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	3.89

Qwen-1.5-7B-Chat-Hybrid

```bash lemonade-server pull Qwen-1.5-7B-Chat-Hybrid ```

Key	Value
Checkpoint	amd/Qwen1.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	8.22

Qwen-2.5-7B-Instruct-Hybrid

```bash lemonade-server pull Qwen-2.5-7B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Qwen2.5-7B-Instruct-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	8.42

Qwen-2.5-3B-Instruct-Hybrid

```bash lemonade-server pull Qwen-2.5-3B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Qwen2.5-3B-Instruct-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	3.84

Qwen-2.5-1.5B-Instruct-Hybrid

```bash lemonade-server pull Qwen-2.5-1.5B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Qwen2.5-1.5B-Instruct-awq-uint4-asym-g128-lmhead-g32-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	2.08

DeepSeek-R1-Distill-Llama-8B-Hybrid

```bash lemonade-server pull DeepSeek-R1-Distill-Llama-8B-Hybrid ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-hybrid
Recipe	oga-hybrid
Labels	reasoning
Size (GB)	8.45

Mistral-7B-v0.3-Instruct-Hybrid

```bash lemonade-server pull Mistral-7B-v0.3-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-fp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	7.31

Llama-3.1-8B-Instruct-Hybrid

```bash lemonade-server pull Llama-3.1-8B-Instruct-Hybrid ```

Key	Value
Checkpoint	amd/Llama-3.1-8B-Instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	8.47

Llama-xLAM-2-8b-fc-r-Hybrid

```bash lemonade-server pull Llama-xLAM-2-8b-fc-r-Hybrid ```

Key	Value
Checkpoint	amd/Llama-xLAM-2-8b-fc-r-awq-g128-int4-asym-bfp16-onnx-hybrid
Recipe	oga-hybrid
Size (GB)	8.47

Ryzen AI NPU

Qwen-2.5-7B-Instruct-NPU

```bash lemonade-server pull Qwen-2.5-7B-Instruct-NPU ```

Key	Value
Checkpoint	amd/Qwen2.5-7B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu
Size (GB)	10.14

Qwen-2.5-1.5B-Instruct-NPU

```bash lemonade-server pull Qwen-2.5-1.5B-Instruct-NPU ```

Key	Value
Checkpoint	amd/Qwen2.5-1.5B-Instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu
Size (GB)	2.89

DeepSeek-R1-Distill-Llama-8B-NPU

```bash lemonade-server pull DeepSeek-R1-Distill-Llama-8B-NPU ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu
Size (GB)	10.63

Mistral-7B-v0.3-Instruct-NPU

```bash lemonade-server pull Mistral-7B-v0.3-Instruct-NPU ```

Key	Value
Checkpoint	amd/Mistral-7B-Instruct-v0.3-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu
Size (GB)	11.75

Phi-3.5-Mini-Instruct-NPU

```bash lemonade-server pull Phi-3.5-Mini-Instruct-NPU ```

Key	Value
Checkpoint	amd/Phi-3.5-mini-instruct-awq-g128-int4-asym-bf16-onnx-ryzen-strix
Recipe	oga-npu
Size (GB)	4.18

FastFlowLM (NPU)

Gemma-3-4b-it-FLM

```bash lemonade-server pull Gemma-3-4b-it-FLM ```

Key	Value
Checkpoint	gemma3:4b
Recipe	flm
Labels	hot, vision
Size (GB)	5.26

Qwen3-4B-Instruct-2507-FLM

```bash lemonade-server pull Qwen3-4B-Instruct-2507-FLM ```

Key	Value
Checkpoint	qwen3-it:4b
Recipe	flm
Labels	hot
Size (GB)	3.07

Qwen3-8b-FLM

```bash lemonade-server pull Qwen3-8b-FLM ```

Key	Value
Checkpoint	qwen3:8b
Recipe	flm
Labels	reasoning
Size (GB)	5.57

Llama-3.2-1B-FLM

```bash lemonade-server pull Llama-3.2-1B-FLM ```

Key	Value
Checkpoint	llama3.2:1b
Recipe	flm
Size (GB)	1.21

Llama-3.2-3B-FLM

```bash lemonade-server pull Llama-3.2-3B-FLM ```

Key	Value
Checkpoint	llama3.2:3b
Recipe	flm
Size (GB)	2.62

Llama-3.1-8B-FLM

```bash lemonade-server pull Llama-3.1-8B-FLM ```

Key	Value
Checkpoint	llama3.1:8b
Recipe	flm
Size (GB)	5.36

gpt-oss-20b-FLM

```bash lemonade-server pull gpt-oss-20b-FLM ```

Key	Value
Checkpoint	gpt-oss:20b
Recipe	flm
Size (GB)	13.4

CPU

Qwen2.5-0.5B-Instruct-CPU

```bash lemonade-server pull Qwen2.5-0.5B-Instruct-CPU ```

Key	Value
Checkpoint	amd/Qwen2.5-0.5B-Instruct-quantized_int4-float16-cpu-onnx
Recipe	oga-cpu
Size (GB)	0.77

Phi-3-Mini-Instruct-CPU

```bash lemonade-server pull Phi-3-Mini-Instruct-CPU ```

Key	Value
Checkpoint	amd/Phi-3-mini-4k-instruct_int4_float16_onnx_cpu
Recipe	oga-cpu
Size (GB)	2.23

Qwen-1.5-7B-Chat-CPU

```bash lemonade-server pull Qwen-1.5-7B-Chat-CPU ```

Key	Value
Checkpoint	amd/Qwen1.5-7B-Chat_uint4_asym_g128_float16_onnx_cpu
Recipe	oga-cpu
Size (GB)	5.89

DeepSeek-R1-Distill-Llama-8B-CPU

```bash lemonade-server pull DeepSeek-R1-Distill-Llama-8B-CPU ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu
Recipe	oga-cpu
Labels	reasoning
Size (GB)	5.78

DeepSeek-R1-Distill-Qwen-7B-CPU

```bash lemonade-server pull DeepSeek-R1-Distill-Qwen-7B-CPU ```

Key	Value
Checkpoint	amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu
Recipe	oga-cpu
Labels	reasoning
Size (GB)	5.78

Naming Convention

The format of each Lemonade name is a combination of the name in the base checkpoint and the backend where the model will run. So, if the base checkpoint is meta-llama/Llama-3.2-1B-Instruct, and it has been optimized to run on Hybrid, the resulting name is Llama-3.2-3B-Instruct-Hybrid.

Model Storage and Management

Lemonade Server relies on Hugging Face Hub to manage downloading and storing models on your system. By default, Hugging Face Hub downloads models to C:\Users\YOUR_USERNAME\.cache\huggingface\hub.

For example, the Lemonade Server Llama-3.2-3B-Instruct-Hybrid model will end up at C:\Users\YOUR_USERNAME\.cache\huggingface\hub\models--amd--Llama-3.2-1B-Instruct-awq-g128-int4-asym-fp16-onnx-hybrid. If you want to uninstall that model, simply delete that folder.

You can change the directory for Hugging Face Hub by setting the HF_HOME or HF_HUB_CACHE environment variables.

Installing Additional Models

Once you’ve installed Lemonade Server, you can install any model on this list using the pull command in the lemonade-server CLI.

Example:

lemonade-server pull Qwen2.5-0.5B-Instruct-CPU

Note: lemonade-server is a utility that is added to your PATH when you install Lemonade Server with the GUI installer. If you are using Lemonade Server from a Python environment, use the lemonade-server-dev pull command instead.

This site is open source. Improve this page.