lemonade

๐Ÿ‹ Lemonade API: Model Compatibility and Recipes

Lemonade API (lemonade.api) provides a simple, high-level interface to load and run LLM models locally. This guide helps you understand what models work with which recipes, what to expect in terms of compatibility, and how to choose the right setup for your hardware.

๐Ÿง  What Is a Recipe?

A recipe defines how a model is run โ€” including backend (e.g., PyTorch, ONNX Runtime), quantization strategy, and device support. The from_pretrained() function in lemonade.api uses the recipe to configure everything automatically. For the list of recipes, see Recipe Compatibility Table. The following is an example of using the Lemonade API from_pretrained() function:

from lemonade.api import from_pretrained

model, tokenizer = from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="hf-cpu")

Function Arguments:

๐Ÿ“œ Supported Model Formats

Lemonade API currently supports:

๐Ÿด Recipe and Checkpoint Compatibility

The following table explains what checkpoints work with each recipe, the hardware and OS requirements, and additional notes:

Recipe Checkpoint Format Hardware Needed Operating System Notes
hf-cpu safetensors (Hugging Face) Any x86 CPU Windows, Linux Compatible with x86 CPUs, offering broad accessibility.
hf-dgpu safetensors (Hugging Face) Compatible Discrete GPU Windows, Linux Requires PyTorch and a compatible GPU.[1]
oga-cpu safetensors (Hugging Face) Any x86 CPU Windows Converted from safetensors via `model_builder`. Accuracy loss due to RTN quantization.
OGA ONNX Any x86 CPU Windows Use models from the CPU Collection.
OGA ONNX AMD Ryzen AI PC Windows Use models from the GPU Collection.
oga-hybrid Pre-quantized OGA ONNX AMD Ryzen AI 300 series PC Windows Use models from the Hybrid Collection. Optimized with AWQ to INT4.
oga-npu Pre-quantized OGA ONNX AMD Ryzen AI 300 series PC Windows Use models from the NPU Collection. Optimized with AWQ to INT4.

[1] Compatible GPUs are those that support PyTorchโ€™s .to("cuda") function. Ensure you have the appropriate version of PyTorch installed (e.g., CUDA or ROCm) for your specific GPU. Note: Lemonade does not install PyTorch with CUDA or ROCm for you. For installation instructions, see PyTorchโ€™s Get Started Guide.

๐Ÿ”„ Converting Models to OGA

Lemonade API will do the conversion for you using OGAโ€™s model_builder if you pass a safetensors checkpoint.

๐Ÿ“ฆ Pre-Converted OGA Models

You can skip the conversion step by using pre-quantized models from AMDโ€™s Hugging Face collection. These models are optimized using Activation Aware Quantization (AWQ), which provides higher-accuracy int4 quantization compared to RTN.

Recipe Collection
oga-hybrid Hybrid Collection
oga-npu NPU Collection
oga-cpu CPU Collection

๐Ÿ“š Additional Resources