Lemonade API (lemonade.api
) provides a simple, high-level interface to load and run LLM models locally. This guide helps you understand what models work with which recipes, what to expect in terms of compatibility, and how to choose the right setup for your hardware.
A recipe defines how a model is run โ including backend (e.g., PyTorch, ONNX Runtime), quantization strategy, and device support. The from_pretrained()
function in lemonade.api
uses the recipe to configure everything automatically. For the list of recipes, see Recipe Compatibility Table. The following is an example of using the Lemonade API from_pretrained()
function:
from lemonade.api import from_pretrained
model, tokenizer = from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="hf-cpu")
Function Arguments:
Lemonade API currently supports:
The following table explains what checkpoints work with each recipe, the hardware and OS requirements, and additional notes:
Recipe | Checkpoint Format | Hardware Needed | Operating System | Notes |
---|---|---|---|---|
hf-cpu |
safetensors (Hugging Face) | Any x86 CPU | Windows, Linux | Compatible with x86 CPUs, offering broad accessibility. |
hf-dgpu |
safetensors (Hugging Face) | Compatible Discrete GPU | Windows, Linux | Requires PyTorch and a compatible GPU.[1] |
oga-cpu |
safetensors (Hugging Face) | Any x86 CPU | Windows | Converted from safetensors via `model_builder`. Accuracy loss due to RTN quantization. |
OGA ONNX | Any x86 CPU | Windows | Use models from the CPU Collection. | |
OGA ONNX | AMD Ryzen AI PC | Windows | Use models from the GPU Collection. | |
oga-hybrid |
Pre-quantized OGA ONNX | AMD Ryzen AI 300 series PC | Windows | Use models from the Hybrid Collection. Optimized with AWQ to INT4. |
oga-npu |
Pre-quantized OGA ONNX | AMD Ryzen AI 300 series PC | Windows | Use models from the NPU Collection. Optimized with AWQ to INT4. |
[1] Compatible GPUs are those that support PyTorchโs .to("cuda")
function. Ensure you have the appropriate version of PyTorch installed (e.g., CUDA or ROCm) for your specific GPU. Note: Lemonade does not install PyTorch with CUDA or ROCm for you. For installation instructions, see PyTorchโs Get Started Guide.
Lemonade API will do the conversion for you using OGAโs model_builder
if you pass a safetensors checkpoint.
You can skip the conversion step by using pre-quantized models from AMDโs Hugging Face collection. These models are optimized using Activation Aware Quantization (AWQ), which provides higher-accuracy int4 quantization compared to RTN.
Recipe | Collection |
---|---|
oga-hybrid |
Hybrid Collection |
oga-npu |
NPU Collection |
oga-cpu |
CPU Collection |