What is Lemonade?
Whether you're looking for efficient local inference or want to experiment with OpenAI's latest open-source technology, Lemonade makes it incredibly easy to get started with gpt-oss models right out of the box.
Lemonade is a local LLM serving platform that focuses on maximizing performance using the best available hardware acceleration - from neural processing units (NPUs) to GPU acceleration. With Lemonade, you can run large language models entirely on your PC while maintaining full privacy and control over your data.
CPU, GPU, and NPU acceleration
Drop-in replacement for OpenAI's API
Built-in model library with one-command installation
Windows and Linux support
Everything runs locally on your machine
Available gpt-oss Models
Lemonade supports both gpt-oss model sizes, each optimized for different use cases:
Optimized for lower latency and local use cases
Perfect for everyday tasks and quick responses
Production-ready model for high reasoning tasks
Ideal for complex reasoning and advanced applications
✨ Advanced Features
Both models feature OpenAI's innovative sliding window attention and attention sink mechanisms, allowing them to handle extremely long conversations and contexts efficiently while maintaining response quality.
Getting Started
Installation
Install Lemonade using pip:
conda create -n lemon python=3.10
conda activate lemon
pip install lemonade-sdk
Or download our GUI installer for Windows.
Running gpt-oss Models
To run the smaller gpt-oss model:
lemonade-server run gpt-oss-20b-GGUF
For the larger model:
lemonade-server run gpt-oss-120b-GGUF
You can also install models ahead of time:
lemonade-server pull gpt-oss-20b-GGUF
lemonade-server pull gpt-oss-120b-GGUF
System Requirements
💾 gpt-oss-20b-GGUF
~13GB RAM recommended for optimal performance
🚀 gpt-oss-120b-GGUF
Significantly more memory required for optimal performance