Backend reference

Backends

Recipe	Name	Selectable backend	Uses ctx_size	Backends
`flm`	FastFlowLM NPU	no	yes	npu
`kokoro`	Kokoro	no	no	cpu, metal
`llamacpp`	Llama.cpp GPU	yes	yes	cpu, cuda, metal, rocm, system, vulkan
`moonshine`	Moonshine	no	no	cpu
`ryzenai-llm`	Ryzen AI LLM	no	yes	npu
`sd-cpp`	StableDiffusion.cpp	yes	no	cpu, cuda, metal, rocm, vulkan
`vllm`	vLLM ROCm (experimental)	yes	yes	rocm
`whispercpp`	Whisper.cpp	yes	no	cpu, metal, npu, rocm, vulkan

Support matrix

Recipe	Backend	OS	Device families
`flm`	npu	linux, windows	amd_npu (XDNA2)
`kokoro`	cpu	linux, windows	cpu (x86_64)
`kokoro`	metal	macos	metal
`llamacpp`	system	linux	cpu (arm64, x86_64)
`llamacpp`	metal	macos	metal
`llamacpp`	cuda	linux, windows	nvidia_gpu (sm_100, sm_120, sm_121, sm_75, sm_80, sm_86, sm_89, sm_90)
`llamacpp`	vulkan	linux, windows	amd_gpu; cpu (arm64, x86_64)
`llamacpp`	rocm	linux, windows	amd_gpu (gfx103X, gfx110X, gfx1150, gfx1151, gfx1152, gfx120X)
`llamacpp`	cpu	linux, windows	cpu (arm64, x86_64)
`moonshine`	cpu	windows	cpu (x86_64)
`moonshine`	cpu	linux	cpu (arm64, x86_64)
`moonshine`	cpu	macos	cpu (arm64)
`ryzenai-llm`	npu	windows	amd_npu (XDNA2)
`sd-cpp`	rocm	linux, windows	amd_gpu (gfx103X, gfx110X, gfx1150, gfx1151, gfx1152, gfx120X)
`sd-cpp`	cuda	linux	nvidia_gpu (sm_100, sm_120, sm_121, sm_75, sm_80, sm_86, sm_89, sm_90)
`sd-cpp`	vulkan	linux, windows	amd_gpu; cpu (x86_64); nvidia_gpu
`sd-cpp`	cpu	linux, windows	cpu (x86_64)
`sd-cpp`	metal	macos	metal
`vllm`	rocm	linux	amd_gpu (gfx110X, gfx1150, gfx1151, gfx120X)
`whispercpp`	npu	windows	amd_npu (XDNA2)
`whispercpp`	rocm	linux, windows	amd_gpu (gfx110X, gfx1150, gfx1151, gfx120X)
`whispercpp`	vulkan	linux, windows	amd_gpu; cpu (x86_64)
`whispercpp`	cpu	linux, windows	cpu (x86_64)
`whispercpp`	metal	macos	metal

Recipe options

`llamacpp` — Llama.cpp GPU

Option	CLI flag	Type	Default	Description
`ctx_size`	`--ctx-size`	SIZE	-1	Context size for the model
`llamacpp_backend`	`--llamacpp`	BACKEND	""	LlamaCpp backend to use
`llamacpp_device`	`--llamacpp-device`	DEVICES	""	Comma-separated list of accelerator devices to use (e.g. Vulkan0)
`llamacpp_args`	`--llamacpp-args`	ARGS	""	Custom arguments to pass to llama-server

`moonshine` — Moonshine

Option	CLI flag	Type	Default	Description
`moonshine_args`	`--moonshine-args`	ARGS	""	Custom arguments to pass to moonshine-server

`sd-cpp` — StableDiffusion.cpp

Option	CLI flag	Type	Default	Description
`sd-cpp_backend`	`--sdcpp`	BACKEND	""	SD.cpp backend to use
`sdcpp_args`	`--sdcpp-args`	ARGS	""	Custom arguments to pass to sd-server (must not conflict with managed args)
`steps`	—	SIZE	20	Number of diffusion steps
`cfg_scale`	—	SIZE	7.0	Classifier-free guidance scale
`width`	—	SIZE	512	Output image width
`height`	—	SIZE	512	Output image height
`sampling_method`	—	ARGS	""	Sampling method
`flow_shift`	—	SIZE	0.0	Flow shift

`vllm` — vLLM ROCm (experimental)

Option	CLI flag	Type	Default	Description
`ctx_size`	`--ctx-size`	SIZE	-1	Context size for the model
`vllm_backend`	`--vllm`	BACKEND	""	vLLM backend to use
`vllm_args`	`--vllm-args`	ARGS	""	Custom arguments to pass to vllm-server

`whispercpp` — Whisper.cpp

Option	CLI flag	Type	Default	Description
`whispercpp_backend`	`--whispercpp`	BACKEND	""	WhisperCpp backend to use
`whispercpp_args`	`--whispercpp-args`	ARGS	""	Custom arguments to pass to whisper-server

Models

`collection.omni` — collection.omni (5 models)

Model	Size (GB)	Labels
`LMX-Omni-5.5B-Lite`	9.3	—
`LMX-Omni-52B-Halo`	44.77	—
`Lite Collection`		—
`RPG-HaloTales-V1`	39.77	—
`Ultra Collection`		—

`kokoro` — Kokoro (1 models)

Model	Size (GB)	Labels
`kokoro-v1`	0.354	tts

`llamacpp` — Llama.cpp GPU (77 models)

Model	Size (GB)	Labels
`Bonsai-1.7B-gguf`	0.25	llamacpp
`Bonsai-4B-gguf`	0.572	llamacpp
`Bonsai-8B-gguf`	1.16	llamacpp
`Cogito-v2-llama-109B-MoE-GGUF`	65.4	vision
`DeepSeek-Qwen3-8B-GGUF`	5.25	reasoning
`Devstral-Small-2507-GGUF`	14.3	coding, tool-calling
`GLM-4.5-Air-UD-Q4K-XL-GGUF`	67.7	reasoning
`GLM-4.7-Flash-GGUF`	17.5	tool-calling
`Gemma-3-4b-it-GGUF`	3.34	vision
`Gemma-4-12B-it-GGUF`	7.29	tool-calling, vision, llamacpp
`Gemma-4-12B-it-MTP-GGUF`	7.75	tool-calling, llamacpp, vision, mtp
`Gemma-4-26B-A4B-it-GGUF`	18.1	hot, tool-calling, vision, llamacpp
`Gemma-4-26B-A4B-it-MTP-GGUF`	18.5	hot, tool-calling, vision, llamacpp, mtp
`Gemma-4-31B-it-GGUF`	19.5	hot, tool-calling, vision, llamacpp
`Gemma-4-31B-it-MTP-GGUF`	20.0	hot, tool-calling, vision, llamacpp, mtp
`Gemma-4-E2B-it-GGUF`	4.09	tool-calling, vision, llamacpp
`Gemma-4-E4B-it-GGUF`	5.97	tool-calling, vision, llamacpp
`Jan-nano-128k-GGUF`	2.5	—
`Jan-v1-4B-GGUF`	2.5	—
`LFM2-1.2B-GGUF`	0.731	—
`LFM2-24B-A2B-GGUF`	14.4	—
`LFM2-8B-A1B-GGUF`	5.04	—
`LFM2.5-1.2B-Instruct-GGUF`	0.731	—
`LFM2.5-8B-A1B`	5.16	—
`Llama-3.2-1B-Instruct-GGUF`	0.834	—
`Llama-3.2-3B-Instruct-GGUF`	2.06	—
`Llama-4-Scout-17B-16E-Instruct-GGUF`	63.2	vision
`Ministral-3-3B-Instruct-2512-GGUF`	2.99	vision
`Nemotron-3-Nano-30B-A3B-GGUF`	22.8	—
`Phi-4-mini-instruct-GGUF`	2.49	—
`Playable1-GGUF`	4.68	coding
`PromptBridge-0.6b-Alpha-GGUF`	0.397	—
`Qwen2.5-Coder-32B-Instruct-GGUF`	19.9	coding
`Qwen2.5-Omni-3B-GGUF`	4.73	vision, chat-transcription
`Qwen2.5-Omni-7B-GGUF`	7.33	vision, chat-transcription
`Qwen2.5-VL-3B-Instruct-GGUF`	3.27	vision
`Qwen2.5-VL-7B-Instruct-GGUF`	6.04	vision
`Qwen3-0.6B-GGUF`	0.38	reasoning
`Qwen3-1.7B-GGUF`	1.06	reasoning
`Qwen3-14B-GGUF`	8.54	reasoning
`Qwen3-30B-A3B-GGUF`	17.4	reasoning
`Qwen3-30B-A3B-Instruct-2507-GGUF`	17.4	tool-calling
`Qwen3-4B-GGUF`	2.38	reasoning
`Qwen3-4B-Instruct-2507-GGUF`	2.5	tool-calling
`Qwen3-8B-GGUF`	5.25	reasoning
`Qwen3-Coder-30B-A3B-Instruct-GGUF`	18.6	coding, tool-calling, hot
`Qwen3-Coder-Next-GGUF`	48.0	coding, tool-calling, hot
`Qwen3-Embedding-0.6B-GGUF`	0.64	embeddings
`Qwen3-Embedding-4B-GGUF`	4.28	embeddings
`Qwen3-Embedding-8B-GGUF`	8.05	embeddings
`Qwen3-Next-80B-A3B-Instruct-GGUF`	46.1	tool-calling
`Qwen3-VL-4B-Instruct-GGUF`	3.33	vision
`Qwen3-VL-8B-Instruct-GGUF`	6.19	vision
`Qwen3.5-0.8B-GGUF`	0.764	vision, tool-calling
`Qwen3.5-122B-A10B-GGUF`	77.9	vision, tool-calling
`Qwen3.5-122B-A10B-MTP-GGUF`	79.6	vision, tool-calling, mtp
`Qwen3.5-27B-GGUF`	18.5	vision, tool-calling
`Qwen3.5-2B-GGUF`	2.01	vision, tool-calling
`Qwen3.5-35B-A3B-GGUF`	23.1	vision, tool-calling
`Qwen3.5-4B-GGUF`	3.58	vision, tool-calling, hot
`Qwen3.5-4B-MTP-GGUF`	3.66	vision, tool-calling, mtp
`Qwen3.5-9B-GGUF`	6.88	vision, tool-calling
`Qwen3.6-27B-GGUF`	18.5	vision, tool-calling
`Qwen3.6-27B-MTP-GGUF`	18.8	vision, tool-calling, mtp, hot
`Qwen3.6-35B-A3B-GGUF`	23.3	vision, tool-calling, hot
`Qwen3.6-35B-A3B-MTP-GGUF`	23.8	vision, tool-calling, mtp
`SmolLM3-3B-GGUF`	1.94	—
`Tiny-Test-Model-GGUF`	0.18	—
`bge-reranker-v2-m3-GGUF`	0.636	reranking
`gpt-oss-120b-GGUF`	62.8	reasoning, tool-calling
`gpt-oss-120b-mxfp-GGUF`	63.4	hot, reasoning, tool-calling
`gpt-oss-20b-GGUF`	11.6	reasoning, tool-calling
`gpt-oss-20b-mxfp4-GGUF`	12.1	hot, reasoning, tool-calling
`granite-4.0-h-tiny-GGUF`	4.25	tool-calling
`jina-reranker-v1-tiny-en-GGUF`	0.0367	reranking
`nomic-embed-text-v1-GGUF`	0.0781	embeddings
`nomic-embed-text-v2-moe-GGUF`	0.51	embeddings

`moonshine` — Moonshine (3 models)

Model	Size (GB)	Labels
`Moonshine-Medium-Streaming`	1.08	transcription, realtime-transcription, hot
`Moonshine-Small-Streaming`	0.431	transcription, realtime-transcription
`Moonshine-Tiny-Streaming`	0.202	transcription, realtime-transcription

`ryzenai-llm` — Ryzen AI LLM (79 models)

Model	Size (GB)	Labels
`AMD-OLMo-1B-SFT-DPO-Hybrid`	1.48	—
`CodeLlama-7b-Instruct-hf-Hybrid`	7.24	coding
`CodeLlama-7b-Instruct-hf-NPU`	7.54	coding
`DeepSeek-R1-Distill-Llama-8B-CPU`	6.2	reasoning
`DeepSeek-R1-Distill-Llama-8B-Hybrid`	9.09	reasoning
`DeepSeek-R1-Distill-Llama-8B-NPU`	9.3	reasoning
`DeepSeek-R1-Distill-Qwen-1.5B-Hybrid`	2.19	reasoning
`DeepSeek-R1-Distill-Qwen-1.5B-NPU`	2.3	reasoning
`DeepSeek-R1-Distill-Qwen-7B-CPU`	6.2	reasoning
`DeepSeek-R1-Distill-Qwen-7B-Hybrid`	8.67	reasoning
`DeepSeek-R1-Distill-Qwen-7B-NPU`	8.87	reasoning
`Gemma-3-4b-it-mm-NPU`	6.68	vision
`Llama-2-7b-chat-hf-Hybrid`	7.31	—
`Llama-2-7b-chat-hf-NPU`	7.47	—
`Llama-2-7b-hf-Hybrid`	7.31	—
`Llama-2-7b-hf-NPU`	7.47	—
`Llama-3.1-8B-Hybrid`	9.09	—
`Llama-3.1-8B-NPU`	9.3	—
`Llama-3.2-1B-Hybrid`	1.89	—
`Llama-3.2-1B-Instruct-CPU`	1.76	—
`Llama-3.2-1B-Instruct-Hybrid`	1.89	—
`Llama-3.2-1B-Instruct-NPU`	1.96	—
`Llama-3.2-1B-NPU`	1.96	—
`Llama-3.2-3B-Hybrid`	4.28	—
`Llama-3.2-3B-Instruct-CPU`	3.38	—
`Llama-3.2-3B-Instruct-Hybrid`	4.28	—
`Meta-Llama-3-8B-Hybrid`	9.06	—
`Meta-Llama-3-8B-NPU`	9.23	—
`Meta-Llama-3.1-8B-Instruct-Hybrid`	9.09	—
`Meta-Llama-3.1-8B-Instruct-NPU`	9.3	—
`Mistral-7B-Instruct-v0.1-Hybrid`	7.84	—
`Mistral-7B-Instruct-v0.1-NPU`	8.01	—
`Mistral-7B-Instruct-v0.2-Hybrid`	7.84	—
`Mistral-7B-Instruct-v0.2-NPU`	8.01	—
`Mistral-7B-Instruct-v0.3-Hybrid`	7.85	—
`Mistral-7B-Instruct-v0.3-NPU`	8.09	—
`Mistral-7B-v0.3-Hybrid`	7.85	—
`Mistral-7B-v0.3-NPU`	8.09	—
`Phi-3-Mini-Instruct-CPU`	2.39	—
`Phi-3-mini-128k-instruct-Hybrid`	4.21	—
`Phi-3-mini-128k-instruct-NPU`	4.35	—
`Phi-3-mini-4k-instruct-Hybrid`	4.19	—
`Phi-3-mini-4k-instruct-NPU`	4.3	—
`Phi-3.5-mini-instruct-Hybrid`	4.21	—
`Phi-3.5-mini-instruct-NPU`	4.35	—
`Phi-4-mini-instruct-Hybrid`	5.47	—
`Phi-4-mini-instruct-NPU`	5.59	—
`Phi-4-mini-reasoning-Hybrid`	5.47	reasoning
`Qwen-1.5-7B-Chat-CPU`	6.32	—
`Qwen-2.5-1.5B-Instruct-Hybrid`	2.17	—
`Qwen-2.5-1.5B-Instruct-NPU`	2.25	—
`Qwen1.5-7B-Chat-Hybrid`	8.83	—
`Qwen1.5-7B-Chat-NPU`	9.02	—
`Qwen2-1.5B-Hybrid`	2.19	—
`Qwen2-1.5B-NPU`	2.3	—
`Qwen2-7B-Hybrid`	8.68	—
`Qwen2-7B-NPU`	8.88	—
`Qwen2.5-0.5B-Instruct-CPU`	0.834	—
`Qwen2.5-0.5B-Instruct-Hybrid`	0.828	—
`Qwen2.5-14B-instruct-Hybrid`	16.5	—
`Qwen2.5-3B-Instruct-Hybrid`	3.97	—
`Qwen2.5-3B-Instruct-NPU`	4.1	—
`Qwen2.5-7B-Instruct-Hybrid`	8.65	—
`Qwen2.5-7B-Instruct-NPU`	8.83	—
`Qwen2.5-Coder-0.5B-Instruct-Hybrid`	0.828	coding
`Qwen2.5-Coder-1.5B-Instruct-Hybrid`	2.17	coding
`Qwen2.5-Coder-1.5B-Instruct-NPU`	2.25	coding
`Qwen2.5-Coder-7B-Instruct-Hybrid`	8.65	coding
`Qwen2.5-Coder-7B-Instruct-NPU`	8.83	coding
`Qwen3-1.7B-Hybrid`	2.55	reasoning
`Qwen3-14B-Hybrid`	16.5	reasoning
`Qwen3-4B-Hybrid`	5.17	reasoning
`Qwen3-8B-Hybrid`	9.42	reasoning
`SmolLM-135M-Instruct-Hybrid`	0.232	—
`SmolLM2-135M-Instruct-Hybrid`	0.233	—
`chatglm3-6b-Hybrid`	6.9	—
`chatglm3-6b-NPU`	7.04	—
`gemma-2-2b-Hybrid`	4.04	—
`gpt-oss-20b-NPU`	13.4	—

`sd-cpp` — StableDiffusion.cpp (12 models)

Model	Size (GB)	Labels
`Flux-2-Klein-4B`	16.1	image, edit
`Flux-2-Klein-9B-GGUF`	19.0	image, edit
`Qwen-Image-2512-GGUF`	19.4	image
`Qwen-Image-GGUF`	18.2	image
`RealESRGAN-x4plus`	0.064	upscaling, image
`RealESRGAN-x4plus-anime`	0.017	upscaling, image
`SD-1.5`	7.7	image
`SD-Turbo`	5.21	image
`SD-Turbo-GGUF`	2.02	image
`SDXL-Base-1.0`	6.94	image
`SDXL-Turbo`	6.94	image
`Z-Image-Turbo`	20.7	image

`vllm` — vLLM ROCm (experimental) (7 models)

Model	Size (GB)	Labels
`GLM-4.7-Flash-FP16-vLLM`	62.47	reasoning, tool-calling
`Qwen3.5-0.8B-FP16-vLLM`	1.77	reasoning
`Qwen3.5-2B-FP16-vLLM`	4.57	reasoning, tool-calling
`Qwen3.5-4B-FP16-vLLM`	9.34	reasoning, hot, tool-calling
`Qwen3.5-9B-FP16-vLLM`	19.3	reasoning, tool-calling
`Qwen3.6-27B-FP16-vLLM`	55.59	reasoning, tool-calling, vision
`Qwen3.6-35B-A3B-FP16-vLLM`	71.93	reasoning, tool-calling, vision

`whispercpp` — Whisper.cpp (6 models)

Model	Size (GB)	Labels
`Whisper-Base`	0.148	transcription, realtime-transcription
`Whisper-Large-v3`	3.1	transcription, realtime-transcription
`Whisper-Large-v3-Turbo`	1.62	transcription, realtime-transcription, hot
`Whisper-Medium`	1.53	transcription, realtime-transcription
`Whisper-Small`	0.488	transcription, realtime-transcription
`Whisper-Tiny`	0.075	transcription, realtime-transcription

Backend reference

Backends

Support matrix

Recipe options

llamacpp — Llama.cpp GPU

moonshine — Moonshine

sd-cpp — StableDiffusion.cpp

vllm — vLLM ROCm (experimental)

whispercpp — Whisper.cpp

Models

collection.omni — collection.omni (5 models)

kokoro — Kokoro (1 models)

llamacpp — Llama.cpp GPU (77 models)

moonshine — Moonshine (3 models)

ryzenai-llm — Ryzen AI LLM (79 models)

sd-cpp — StableDiffusion.cpp (12 models)

vllm — vLLM ROCm (experimental) (7 models)

whispercpp — Whisper.cpp (6 models)

`llamacpp` — Llama.cpp GPU

`moonshine` — Moonshine

`sd-cpp` — StableDiffusion.cpp

`vllm` — vLLM ROCm (experimental)

`whispercpp` — Whisper.cpp

`collection.omni` — collection.omni (5 models)

`kokoro` — Kokoro (1 models)

`llamacpp` — Llama.cpp GPU (77 models)

`moonshine` — Moonshine (3 models)

`ryzenai-llm` — Ryzen AI LLM (79 models)

`sd-cpp` — StableDiffusion.cpp (12 models)

`vllm` — vLLM ROCm (experimental) (7 models)

`whispercpp` — Whisper.cpp (6 models)