Lemonade is an open-source local LLM solution that: - Gets you started in minutes with one-click installers. - Auto-configures optimized inference engines for your PC. - Provides a convenient app to get set up and test out LLMs. - Provides LLMs through the OpenAI API standard, enabling apps on your PC to access them.
Visit https://lemonade-server.ai/install_options.html and click the options that apply to you.
For more information on AMD Ryzen AI NPU Support, see the section Hybrid/NPU.
Yes, Linux is supported!
Visit the Supported Configurations section to see the support matrix for CPU, GPU, and NPU.
To uninstall Lemonade Server, use the Windows Add/Remove Programs menu.
Optional: Remove cached files
%USERPROFILE%\.cachelemonade folder if it existshuggingface folderLemonade uses three model locations:
Primary: Hugging Face Cache
Models downloaded through Lemonade are stored using the Hugging Face Hub specification. By default, models are located at ~/.cache/huggingface/hub/, where ~ is your home directory.
For example, Qwen/Qwen2.5-0.5B is stored at ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B.
You can change this location by setting the HF_HOME env var, which will store your models in $HF_HOME/hub (e.g., $HF_HOME/hub/models--Qwen--Qwen2.5-0.5B). Alternatively, you can set HF_HUB_CACHE and your models will be in $HF_HUB_CACHE (e.g., $HF_HUB_CACHE/models--Qwen--Qwen2.5-0.5B).
You can use the official Hugging Face Hub utility (pip install huggingface-hub) to manage models outside of Lemonade, e.g., hf cache ls will print all models and their sizes.
Secondary: Extra Models Directory (GGUF)
Lemonade Server can discover GGUF models from a secondary directory using the --extra-models-dir option, enabling compatibility with llama.cpp and LM Studio model caches. Suggested paths:
C:\Users\You\.lmstudio\models%LOCALAPPDATA%\llama.cpp (e.g., C:\Users\You\AppData\Local\llama.cpp)~/.cache/llama.cppExample: lemonade-server serve --extra-models-dir "%LOCALAPPDATA%\llama.cpp"
Any .gguf files found in this directory (including subdirectories) will automatically appear in Lemonade’s model list in the custom category.
FastFlowLM
FastFlowLM (FLM) has its own model management system. When you first install FLM the install wizard asks for a model directory, which is then saved to the FLM_MODEL_PATH environment variable on your system PATH. Models are stored in that directory. If you change the variable’s value, newly downloaded models will be stored on the new path, but your prior models will still be at the prior path.
Lemonade supports a wide range of LLMs including LLaMA, DeepSeek, Qwen, Gemma, Phi, gpt-oss, LFM, and many more. Most GGUF models can also be added to Lemonade Server by users using the Model Manager interface in the app or the pull command on the CLI.
👉 Supported Models List 👉 pull command
Model compatibility depends on your system’s RAM, VRAM, and NPU availability. The actual file size varies significantly between models due to different quantization techniques and architectures.
To check if a model will work:
amd/Qwen2.5-7B-Chat-awq-g128-int4-asym-fp16-onnx-hybrid).If a model isn’t listed, it may not be compatible with your PC due to device or RAM limitations, or we just haven’t added it to the server_models.json file yet.
You can:
server_models.json file.If you are sure that a model should be listed, but you aren’t seeing it, you can set the LEMONADE_DISABLE_MODEL_FILTERING environment variable to show all models supported by Lemonade on any PC configuration. But please note, this can show models that definitely won’t work on your system.
Yes, there’s a guide on preparing your models for Ryzen AI NPU:
Yes! Lemonade Server exposes a /stats endpoint that returns performance metrics from the most recent completion request:
curl http://localhost:8000/api/v1/stats
Or, you can launch lemonade-server with the option --log-level debug and that will also print out stats.
Lemonade supports llama.cpp as a backend, so performance is similar when using the same model and quantization.
File a detailed issue on TheRock repo for support: https://github.com/ROCm/TheRock
Strix Halo PCs can have up to 128 GB of unified RAM and Windows allows the user to allocate a portion of this to dedicated GPU RAM.
We suggest setting dedicated GPU RAM to 64/64 (auto).
Note: On Windows, the GPU can access both unified RAM and dedicated GPU RAM, but the CPU is blocked from accessing dedicated GPU RAM. For this reason, allocating too much dedicated GPU RAM can interfere with model loading, which requires the CPU to access a substantial amount unified RAM.
Yes, today, NPU and hybrid inference is currently supported only on Windows.
To request NPU support on Linux, file an issue with either: - Ryzen AI SW: https://github.com/amd/ryzenai-sw - FastFlowLM: https://github.com/FastFlowLM/FastFlowLM
Yes. In hybrid mode:
Yes! Lemonade supports multiple execution modes:
While you won’t get NPU acceleration on non-Ryzen AI 300 systems, you can still benefit from GPU acceleration and the OpenAI-compatible API.
No inference engine providers have plans to support NPUs prior to Ryzen AI 300-series, but you can still request this by filing an issue on their respective GitHubs: - Ryzen AI SW: https://github.com/amd/ryzenai-sw - FastFlowLM: https://github.com/FastFlowLM/FastFlowLM
AMD publishes pre-quantized and optimized models in their Hugging Face collections:
To find the architecture of a specific model, click on any model in these collections and look for the “Base model” field, which will show you the underlying architecture (e.g., Llama, Qwen, Phi).
Make sure that you’ve put the NPU in “Turbo” mode to get the best results. This is done by opening a terminal window and running the following commands:
cd C:\Windows\System32\AMD
.\xrt-smi configure --pmode turbo
Lemonade supports running the server on one machine while using the app from another machine on the same network.
Quick setup:
lemonade-server serve --host 0.0.0.0 --port 8000
lemonade-app --base-url http://SERVER_IP:8000
For detailed instructions and security considerations, see Remote Server Connection.
Check the Lemonade Server logs via the App (all supported OSes) or tray icon (Windows only). Common issues include model compatibility or outdated versions.
Open a feature request on GitHub. We’re actively shaping the roadmap based on user feedback.
Yes! Check out the project README:
👉 Lemonade Roadmap