Skip to content

Backend reference

Backends

Recipe Name Selectable backend Uses ctx_size Backends
flm FastFlowLM NPU no yes npu
kokoro Kokoro no no cpu, metal
llamacpp Llama.cpp GPU yes yes cpu, cuda, metal, rocm, system, vulkan
moonshine Moonshine no no cpu
ryzenai-llm Ryzen AI LLM no yes npu
sd-cpp StableDiffusion.cpp yes no cpu, cuda, metal, rocm, vulkan
vllm vLLM ROCm (experimental) yes yes rocm
whispercpp Whisper.cpp yes no cpu, metal, npu, rocm, vulkan

Support matrix

Recipe Backend OS Device families
flm npu linux, windows amd_npu (XDNA2)
kokoro cpu linux, windows cpu (x86_64)
kokoro metal macos metal
llamacpp system linux cpu (arm64, x86_64)
llamacpp metal macos metal
llamacpp cuda linux, windows nvidia_gpu (sm_100, sm_120, sm_121, sm_75, sm_80, sm_86, sm_89, sm_90)
llamacpp vulkan linux, windows amd_gpu; cpu (arm64, x86_64)
llamacpp rocm linux, windows amd_gpu (gfx103X, gfx110X, gfx1150, gfx1151, gfx1152, gfx120X)
llamacpp cpu linux, windows cpu (arm64, x86_64)
moonshine cpu windows cpu (x86_64)
moonshine cpu linux cpu (arm64, x86_64)
moonshine cpu macos cpu (arm64)
ryzenai-llm npu windows amd_npu (XDNA2)
sd-cpp rocm linux, windows amd_gpu (gfx103X, gfx110X, gfx1150, gfx1151, gfx1152, gfx120X)
sd-cpp cuda linux nvidia_gpu (sm_100, sm_120, sm_121, sm_75, sm_80, sm_86, sm_89, sm_90)
sd-cpp vulkan linux, windows amd_gpu; cpu (x86_64); nvidia_gpu
sd-cpp cpu linux, windows cpu (x86_64)
sd-cpp metal macos metal
vllm rocm linux amd_gpu (gfx110X, gfx1150, gfx1151, gfx120X)
whispercpp npu windows amd_npu (XDNA2)
whispercpp rocm linux, windows amd_gpu (gfx110X, gfx1150, gfx1151, gfx120X)
whispercpp vulkan linux, windows amd_gpu; cpu (x86_64)
whispercpp cpu linux, windows cpu (x86_64)
whispercpp metal macos metal

Recipe options

llamacpp — Llama.cpp GPU

Option CLI flag Type Default Description
ctx_size --ctx-size SIZE -1 Context size for the model
llamacpp_backend --llamacpp BACKEND "" LlamaCpp backend to use
llamacpp_device --llamacpp-device DEVICES "" Comma-separated list of accelerator devices to use (e.g. Vulkan0)
llamacpp_args --llamacpp-args ARGS "" Custom arguments to pass to llama-server

moonshine — Moonshine

Option CLI flag Type Default Description
moonshine_args --moonshine-args ARGS "" Custom arguments to pass to moonshine-server

sd-cpp — StableDiffusion.cpp

Option CLI flag Type Default Description
sd-cpp_backend --sdcpp BACKEND "" SD.cpp backend to use
sdcpp_args --sdcpp-args ARGS "" Custom arguments to pass to sd-server (must not conflict with managed args)
steps SIZE 20 Number of diffusion steps
cfg_scale SIZE 7.0 Classifier-free guidance scale
width SIZE 512 Output image width
height SIZE 512 Output image height
sampling_method ARGS "" Sampling method
flow_shift SIZE 0.0 Flow shift

vllm — vLLM ROCm (experimental)

Option CLI flag Type Default Description
ctx_size --ctx-size SIZE -1 Context size for the model
vllm_backend --vllm BACKEND "" vLLM backend to use
vllm_args --vllm-args ARGS "" Custom arguments to pass to vllm-server

whispercpp — Whisper.cpp

Option CLI flag Type Default Description
whispercpp_backend --whispercpp BACKEND "" WhisperCpp backend to use
whispercpp_args --whispercpp-args ARGS "" Custom arguments to pass to whisper-server

Models

collection.omni — collection.omni (5 models)

Model Size (GB) Labels
LMX-Omni-5.5B-Lite 9.3
LMX-Omni-52B-Halo 44.77
Lite Collection
RPG-HaloTales-V1 39.77
Ultra Collection

kokoro — Kokoro (1 models)

Model Size (GB) Labels
kokoro-v1 0.354 tts

llamacpp — Llama.cpp GPU (77 models)

Model Size (GB) Labels
Bonsai-1.7B-gguf 0.25 llamacpp
Bonsai-4B-gguf 0.572 llamacpp
Bonsai-8B-gguf 1.16 llamacpp
Cogito-v2-llama-109B-MoE-GGUF 65.4 vision
DeepSeek-Qwen3-8B-GGUF 5.25 reasoning
Devstral-Small-2507-GGUF 14.3 coding, tool-calling
GLM-4.5-Air-UD-Q4K-XL-GGUF 67.7 reasoning
GLM-4.7-Flash-GGUF 17.5 tool-calling
Gemma-3-4b-it-GGUF 3.34 vision
Gemma-4-12B-it-GGUF 7.29 tool-calling, vision, llamacpp
Gemma-4-12B-it-MTP-GGUF 7.75 tool-calling, llamacpp, vision, mtp
Gemma-4-26B-A4B-it-GGUF 18.1 hot, tool-calling, vision, llamacpp
Gemma-4-26B-A4B-it-MTP-GGUF 18.5 hot, tool-calling, vision, llamacpp, mtp
Gemma-4-31B-it-GGUF 19.5 hot, tool-calling, vision, llamacpp
Gemma-4-31B-it-MTP-GGUF 20.0 hot, tool-calling, vision, llamacpp, mtp
Gemma-4-E2B-it-GGUF 4.09 tool-calling, vision, llamacpp
Gemma-4-E4B-it-GGUF 5.97 tool-calling, vision, llamacpp
Jan-nano-128k-GGUF 2.5
Jan-v1-4B-GGUF 2.5
LFM2-1.2B-GGUF 0.731
LFM2-24B-A2B-GGUF 14.4
LFM2-8B-A1B-GGUF 5.04
LFM2.5-1.2B-Instruct-GGUF 0.731
LFM2.5-8B-A1B 5.16
Llama-3.2-1B-Instruct-GGUF 0.834
Llama-3.2-3B-Instruct-GGUF 2.06
Llama-4-Scout-17B-16E-Instruct-GGUF 63.2 vision
Ministral-3-3B-Instruct-2512-GGUF 2.99 vision
Nemotron-3-Nano-30B-A3B-GGUF 22.8
Phi-4-mini-instruct-GGUF 2.49
Playable1-GGUF 4.68 coding
PromptBridge-0.6b-Alpha-GGUF 0.397
Qwen2.5-Coder-32B-Instruct-GGUF 19.9 coding
Qwen2.5-Omni-3B-GGUF 4.73 vision, chat-transcription
Qwen2.5-Omni-7B-GGUF 7.33 vision, chat-transcription
Qwen2.5-VL-3B-Instruct-GGUF 3.27 vision
Qwen2.5-VL-7B-Instruct-GGUF 6.04 vision
Qwen3-0.6B-GGUF 0.38 reasoning
Qwen3-1.7B-GGUF 1.06 reasoning
Qwen3-14B-GGUF 8.54 reasoning
Qwen3-30B-A3B-GGUF 17.4 reasoning
Qwen3-30B-A3B-Instruct-2507-GGUF 17.4 tool-calling
Qwen3-4B-GGUF 2.38 reasoning
Qwen3-4B-Instruct-2507-GGUF 2.5 tool-calling
Qwen3-8B-GGUF 5.25 reasoning
Qwen3-Coder-30B-A3B-Instruct-GGUF 18.6 coding, tool-calling, hot
Qwen3-Coder-Next-GGUF 48.0 coding, tool-calling, hot
Qwen3-Embedding-0.6B-GGUF 0.64 embeddings
Qwen3-Embedding-4B-GGUF 4.28 embeddings
Qwen3-Embedding-8B-GGUF 8.05 embeddings
Qwen3-Next-80B-A3B-Instruct-GGUF 46.1 tool-calling
Qwen3-VL-4B-Instruct-GGUF 3.33 vision
Qwen3-VL-8B-Instruct-GGUF 6.19 vision
Qwen3.5-0.8B-GGUF 0.764 vision, tool-calling
Qwen3.5-122B-A10B-GGUF 77.9 vision, tool-calling
Qwen3.5-122B-A10B-MTP-GGUF 79.6 vision, tool-calling, mtp
Qwen3.5-27B-GGUF 18.5 vision, tool-calling
Qwen3.5-2B-GGUF 2.01 vision, tool-calling
Qwen3.5-35B-A3B-GGUF 23.1 vision, tool-calling
Qwen3.5-4B-GGUF 3.58 vision, tool-calling, hot
Qwen3.5-4B-MTP-GGUF 3.66 vision, tool-calling, mtp
Qwen3.5-9B-GGUF 6.88 vision, tool-calling
Qwen3.6-27B-GGUF 18.5 vision, tool-calling
Qwen3.6-27B-MTP-GGUF 18.8 vision, tool-calling, mtp, hot
Qwen3.6-35B-A3B-GGUF 23.3 vision, tool-calling, hot
Qwen3.6-35B-A3B-MTP-GGUF 23.8 vision, tool-calling, mtp
SmolLM3-3B-GGUF 1.94
Tiny-Test-Model-GGUF 0.18
bge-reranker-v2-m3-GGUF 0.636 reranking
gpt-oss-120b-GGUF 62.8 reasoning, tool-calling
gpt-oss-120b-mxfp-GGUF 63.4 hot, reasoning, tool-calling
gpt-oss-20b-GGUF 11.6 reasoning, tool-calling
gpt-oss-20b-mxfp4-GGUF 12.1 hot, reasoning, tool-calling
granite-4.0-h-tiny-GGUF 4.25 tool-calling
jina-reranker-v1-tiny-en-GGUF 0.0367 reranking
nomic-embed-text-v1-GGUF 0.0781 embeddings
nomic-embed-text-v2-moe-GGUF 0.51 embeddings

moonshine — Moonshine (3 models)

Model Size (GB) Labels
Moonshine-Medium-Streaming 1.08 transcription, realtime-transcription, hot
Moonshine-Small-Streaming 0.431 transcription, realtime-transcription
Moonshine-Tiny-Streaming 0.202 transcription, realtime-transcription

ryzenai-llm — Ryzen AI LLM (79 models)

Model Size (GB) Labels
AMD-OLMo-1B-SFT-DPO-Hybrid 1.48
CodeLlama-7b-Instruct-hf-Hybrid 7.24 coding
CodeLlama-7b-Instruct-hf-NPU 7.54 coding
DeepSeek-R1-Distill-Llama-8B-CPU 6.2 reasoning
DeepSeek-R1-Distill-Llama-8B-Hybrid 9.09 reasoning
DeepSeek-R1-Distill-Llama-8B-NPU 9.3 reasoning
DeepSeek-R1-Distill-Qwen-1.5B-Hybrid 2.19 reasoning
DeepSeek-R1-Distill-Qwen-1.5B-NPU 2.3 reasoning
DeepSeek-R1-Distill-Qwen-7B-CPU 6.2 reasoning
DeepSeek-R1-Distill-Qwen-7B-Hybrid 8.67 reasoning
DeepSeek-R1-Distill-Qwen-7B-NPU 8.87 reasoning
Gemma-3-4b-it-mm-NPU 6.68 vision
Llama-2-7b-chat-hf-Hybrid 7.31
Llama-2-7b-chat-hf-NPU 7.47
Llama-2-7b-hf-Hybrid 7.31
Llama-2-7b-hf-NPU 7.47
Llama-3.1-8B-Hybrid 9.09
Llama-3.1-8B-NPU 9.3
Llama-3.2-1B-Hybrid 1.89
Llama-3.2-1B-Instruct-CPU 1.76
Llama-3.2-1B-Instruct-Hybrid 1.89
Llama-3.2-1B-Instruct-NPU 1.96
Llama-3.2-1B-NPU 1.96
Llama-3.2-3B-Hybrid 4.28
Llama-3.2-3B-Instruct-CPU 3.38
Llama-3.2-3B-Instruct-Hybrid 4.28
Meta-Llama-3-8B-Hybrid 9.06
Meta-Llama-3-8B-NPU 9.23
Meta-Llama-3.1-8B-Instruct-Hybrid 9.09
Meta-Llama-3.1-8B-Instruct-NPU 9.3
Mistral-7B-Instruct-v0.1-Hybrid 7.84
Mistral-7B-Instruct-v0.1-NPU 8.01
Mistral-7B-Instruct-v0.2-Hybrid 7.84
Mistral-7B-Instruct-v0.2-NPU 8.01
Mistral-7B-Instruct-v0.3-Hybrid 7.85
Mistral-7B-Instruct-v0.3-NPU 8.09
Mistral-7B-v0.3-Hybrid 7.85
Mistral-7B-v0.3-NPU 8.09
Phi-3-Mini-Instruct-CPU 2.39
Phi-3-mini-128k-instruct-Hybrid 4.21
Phi-3-mini-128k-instruct-NPU 4.35
Phi-3-mini-4k-instruct-Hybrid 4.19
Phi-3-mini-4k-instruct-NPU 4.3
Phi-3.5-mini-instruct-Hybrid 4.21
Phi-3.5-mini-instruct-NPU 4.35
Phi-4-mini-instruct-Hybrid 5.47
Phi-4-mini-instruct-NPU 5.59
Phi-4-mini-reasoning-Hybrid 5.47 reasoning
Qwen-1.5-7B-Chat-CPU 6.32
Qwen-2.5-1.5B-Instruct-Hybrid 2.17
Qwen-2.5-1.5B-Instruct-NPU 2.25
Qwen1.5-7B-Chat-Hybrid 8.83
Qwen1.5-7B-Chat-NPU 9.02
Qwen2-1.5B-Hybrid 2.19
Qwen2-1.5B-NPU 2.3
Qwen2-7B-Hybrid 8.68
Qwen2-7B-NPU 8.88
Qwen2.5-0.5B-Instruct-CPU 0.834
Qwen2.5-0.5B-Instruct-Hybrid 0.828
Qwen2.5-14B-instruct-Hybrid 16.5
Qwen2.5-3B-Instruct-Hybrid 3.97
Qwen2.5-3B-Instruct-NPU 4.1
Qwen2.5-7B-Instruct-Hybrid 8.65
Qwen2.5-7B-Instruct-NPU 8.83
Qwen2.5-Coder-0.5B-Instruct-Hybrid 0.828 coding
Qwen2.5-Coder-1.5B-Instruct-Hybrid 2.17 coding
Qwen2.5-Coder-1.5B-Instruct-NPU 2.25 coding
Qwen2.5-Coder-7B-Instruct-Hybrid 8.65 coding
Qwen2.5-Coder-7B-Instruct-NPU 8.83 coding
Qwen3-1.7B-Hybrid 2.55 reasoning
Qwen3-14B-Hybrid 16.5 reasoning
Qwen3-4B-Hybrid 5.17 reasoning
Qwen3-8B-Hybrid 9.42 reasoning
SmolLM-135M-Instruct-Hybrid 0.232
SmolLM2-135M-Instruct-Hybrid 0.233
chatglm3-6b-Hybrid 6.9
chatglm3-6b-NPU 7.04
gemma-2-2b-Hybrid 4.04
gpt-oss-20b-NPU 13.4

sd-cpp — StableDiffusion.cpp (12 models)

Model Size (GB) Labels
Flux-2-Klein-4B 16.1 image, edit
Flux-2-Klein-9B-GGUF 19.0 image, edit
Qwen-Image-2512-GGUF 19.4 image
Qwen-Image-GGUF 18.2 image
RealESRGAN-x4plus 0.064 upscaling, image
RealESRGAN-x4plus-anime 0.017 upscaling, image
SD-1.5 7.7 image
SD-Turbo 5.21 image
SD-Turbo-GGUF 2.02 image
SDXL-Base-1.0 6.94 image
SDXL-Turbo 6.94 image
Z-Image-Turbo 20.7 image

vllm — vLLM ROCm (experimental) (7 models)

Model Size (GB) Labels
GLM-4.7-Flash-FP16-vLLM 62.47 reasoning, tool-calling
Qwen3.5-0.8B-FP16-vLLM 1.77 reasoning
Qwen3.5-2B-FP16-vLLM 4.57 reasoning, tool-calling
Qwen3.5-4B-FP16-vLLM 9.34 reasoning, hot, tool-calling
Qwen3.5-9B-FP16-vLLM 19.3 reasoning, tool-calling
Qwen3.6-27B-FP16-vLLM 55.59 reasoning, tool-calling, vision
Qwen3.6-35B-A3B-FP16-vLLM 71.93 reasoning, tool-calling, vision

whispercpp — Whisper.cpp (6 models)

Model Size (GB) Labels
Whisper-Base 0.148 transcription, realtime-transcription
Whisper-Large-v3 3.1 transcription, realtime-transcription
Whisper-Large-v3-Turbo 1.62 transcription, realtime-transcription, hot
Whisper-Medium 1.53 transcription, realtime-transcription
Whisper-Small 0.488 transcription, realtime-transcription
Whisper-Tiny 0.075 transcription, realtime-transcription