🍋 Lemonade API: Model Compatibility and Recipes

Lemonade API (lemonade.api) provides a simple, high-level interface to load and run LLM models locally. This guide helps you understand what models work with which recipes, what to expect in terms of compatibility, and how to choose the right setup for your hardware.

🧠 What Is a Recipe?

A recipe defines how a model is run — including backend (e.g., PyTorch, ONNX Runtime), quantization strategy, and device support. The from_pretrained() function in lemonade.api uses the recipe to configure everything automatically. For the list of recipes, see Recipe Compatibility Table. The following is an example of using the Lemonade API from_pretrained() function:

from lemonade.api import from_pretrained

model, tokenizer = from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="hf-cpu")

Function Arguments: - checkpoint: The Hugging Face or OGA checkpoint that defines the LLM. - recipe: Defines the implementation and hardware used for the LLM. Default is "hf-cpu".

🔍 System Information Functions

The Lemonade API also provides functions to programmatically access system and device information:

from lemonade.api import get_system_info, get_device_info

# Get essential system information (OS, processor, memory, devices)
system_info = get_system_info()

# Get detailed system information including Python packages and extended details
system_info_verbose = get_system_info(verbose=True)

# Get only device information (CPU, GPU, NPU details)
device_info = get_device_info()

These functions return the same information available through the lemonade system-info CLI command and the /api/v1/system-info server endpoint, including: - Hardware details (CPU, memory, BIOS) - Device availability (AMD iGPU, dGPU, NPU) - Inference engine compatibility per device - Driver versions and system configuration - Python package versions (verbose mode only)

📜 Supported Model Formats

Lemonade API currently supports:

Hugging Face hosted safetensors checkpoints
AMD OGA (ONNXRuntime-GenAI) ONNX checkpoints

🍴 Recipe and Checkpoint Compatibility

The following table explains what checkpoints work with each recipe, the hardware and OS requirements, and additional notes:

Recipe	Checkpoint Format	Hardware Needed	Operating System	Notes
`hf-cpu`	safetensors (Hugging Face)	Any x86 CPU	Windows, Linux	Compatible with x86 CPUs, offering broad accessibility.
`hf-dgpu`	safetensors (Hugging Face)	Compatible Discrete GPU	Windows, Linux	Requires PyTorch and a compatible GPU.^[1]
`oga-cpu`	safetensors (Hugging Face)	Any x86 CPU	Windows	Converted from safetensors via `model_builder`. Accuracy loss due to RTN quantization.
`oga-cpu`	OGA ONNX	Any x86 CPU	Windows	Use models from the CPU Collection.
OGA ONNX	AMD Ryzen AI PC	Windows	Use models from the GPU Collection.
`oga-hybrid`	Pre-quantized OGA ONNX	AMD Ryzen AI 300 series PC	Windows	Use models from the Hybrid Collection. Optimized with AWQ to INT4.
`oga-npu`	Pre-quantized OGA ONNX	AMD Ryzen AI 300 series PC	Windows	Use models from the NPU Collection. Optimized with AWQ to INT4.

^[1] Compatible GPUs are those that support PyTorch's .to("cuda") function. Ensure you have the appropriate version of PyTorch installed (e.g., CUDA or ROCm) for your specific GPU. Note: Lemonade does not install PyTorch with CUDA or ROCm for you. For installation instructions, see PyTorch's Get Started Guide.

🔄 Converting Models to OGA

Lemonade API will do the conversion for you using OGA's model_builder if you pass a safetensors checkpoint.

Takes \~1–5 minutes per model.
Uses RTN quantization (int4).
For better quality, use pre-quantized models (see below).

📦 Pre-Converted OGA Models

You can skip the conversion step by using pre-quantized models from AMD’s Hugging Face collection. These models are optimized using Activation Aware Quantization (AWQ), which provides higher-accuracy int4 quantization compared to RTN.

Recipe	Collection
`oga-hybrid`	Hybrid Collection
`oga-npu`	NPU Collection
`oga-cpu`	CPU Collection