Refreshingly simple
local chat.
The omni-modal alternative to cloud AI. Automatically optimized for your GPU and NPU. Open source, community driven, and private.
async def stream_gpu_metrics(ws):
while True:
stats = await gpu.poll()
await ws.send_json(stats)
await asyncio.sleep(0.5)
...
Quickstart
Built by the community. Optimized by AMD.
Lemonade is a local AI runtime with every capability you need to build great experiences.
Agile
Automatically deploys the latest models and engines. Extra optimized for Ryzen AI, Radeon, and Strix Halo PCs.
Explore ModelsPortable
Integrate once, deploy the <10 MB binary on any computer running Windows, Linux, or macOS.
Embed in Your AppOmni Modal
Standard endpoints for chat, vision, image gen, editing, speech gen, and ASR.
Read Endpoints SpecFree & Private
Open source. No strings attached. No telemetry. Customize and redistribute to your heart's content.
Visit the GitHubWorks with great apps.
Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.
Specs that enable AI workflows.
Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.
Native C++ backend
One Minute Install
OpenAI API compatible
Auto-detects your hardware
Multi-engine support
Run many models at once
Cross-platform support
Built-in control panel app
One local service for every modality.
Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.
Always improving.
Track the newest improvements and highlights from the Lemonade release stream.
Loading releases...