Run OpenAI's gpt-oss locally with Lemonade

What is Lemonade?

Whether you're looking for efficient local inference or want to experiment with OpenAI's latest open-source technology, Lemonade makes it incredibly easy to get started with gpt-oss models right out of the box.

Lemonade is a local LLM serving platform that focuses on maximizing performance using the best available hardware acceleration - from neural processing units (NPUs) to GPU acceleration. With Lemonade, you can run large language models entirely on your PC while maintaining full privacy and control over your data.

🧠 Multi-hardware support

CPU, GPU, and NPU acceleration

🔌 OpenAI-compatible API

Drop-in replacement for OpenAI's API

🎯 Easy model management

Built-in model library with one-command installation

🖥️ Cross-platform

Windows, Linux, and macOS support

🔒 Privacy-first

Everything runs locally on your machine

Available gpt-oss Models

Lemonade supports both gpt-oss model sizes, each optimized for different use cases:

gpt-oss-20b Optimized

Optimized for lower latency and local use cases

Total Parameters: 21B

Active Parameters: 3.6B

Perfect for everyday tasks and quick responses

gpt-oss-120b Production

Production-ready model for high reasoning tasks

Total Parameters: 117B

Active Parameters: 5.1B

Ideal for complex reasoning and advanced applications

✨ Advanced Features

Both models feature OpenAI's innovative sliding window attention and attention sink mechanisms, allowing them to handle extremely long conversations and contexts efficiently while maintaining response quality.

Getting Started

Installation

Install Lemonade using pip:

Terminal / Command Prompt

conda create -n lemon python=3.10
conda activate lemon
pip install lemonade-sdk

Or download our GUI installer for Windows.

Running gpt-oss Models

To run the smaller gpt-oss model:

Run gpt-oss-20b

lemonade-server run gpt-oss-20b-GGUF

For the larger model:

Run gpt-oss-120b

lemonade-server run gpt-oss-120b-GGUF

You can also install models ahead of time:

Pre-install Models

lemonade-server pull gpt-oss-20b-GGUF
lemonade-server pull gpt-oss-120b-GGUF

Run OpenAI's gpt-oss locally with Lemonade

What is Lemonade?

Available gpt-oss Models

✨ Advanced Features

Getting Started

Installation

Running gpt-oss Models

System Requirements

💾 gpt-oss-20b-GGUF

🚀 gpt-oss-120b-GGUF

Ready to Get Started?