New Models

Run OpenAI's gpt-oss locally with Lemonade

We're excited to announce that Lemonade now supports OpenAI's gpt-oss models, bringing you the power to run these cutting-edge models locally on your own hardware! 🎉
Date:August 12, 2025
Authors:Daniel Holanda, Jeremy Fowers, Krishna Sivakumar, Victoria Godsoe
Overview

What is Lemonade?

Lemonade is a local AI runtime that makes it easy to run models like gpt-oss on your own hardware with privacy by default. It is optimized for fast setup, OpenAI API compatibility, and practical performance across common local acceleration stacks.

One Minute Install

Simple setup flow that gets the local stack running quickly.

OpenAI API

Works with many apps out-of-box and integrates in minutes.

Hardware Auto-Setup

Configures dependencies for your GPU and NPU acceleration stack.

Multi-Engine Support

Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more.

Multi-Model Runtime

Run more than one model at the same time on a single machine.

Cross-platform

A consistent experience across Windows, Linux, and macOS.

Model lineup

Choose your gpt-oss model

Use 20B for faster local responsiveness, or 120B for deeper reasoning quality.

gpt-oss-20b Optimized

Optimized for lower latency and local use cases.

Total Parameters21B
Active Parameters3.6B

Perfect for everyday tasks and quick responses.

gpt-oss-120b Production

Production-ready model for high reasoning tasks.

Total Parameters117B
Active Parameters5.1B

Ideal for complex reasoning and advanced applications.

Advanced features

Both models feature OpenAI's sliding window attention and attention sink mechanisms, allowing them to handle long conversations and contexts efficiently while maintaining response quality.

Quickstart

Install and run gpt-oss

Set up Lemonade, download your model, and start chatting locally in minutes.

Install Lemonade

Use these quick download links to get started:

Operating System Downloads
Windows lemonade.msi
Ubuntu lemonade-server_latest_amd64.deb
macOS (beta) Lemonade-latest-Darwin.pkg

Other platforms? See Installation Options for Docker, Snap, Arch, Fedora, and Debian.

Run gpt-oss models

Pull and run the 20B model:

Run gpt-oss-20b
lemonade-server pull gpt-oss-20b-GGUF
lemonade-server run gpt-oss-20b-GGUF

For higher reasoning quality, pull and run 120B:

Run gpt-oss-120b
lemonade-server pull gpt-oss-120b-GGUF
lemonade-server run gpt-oss-120b-GGUF

Tip: keep models pre-downloaded to avoid startup delays:

Pre-download both
lemonade-server pull gpt-oss-20b-GGUF
lemonade-server pull gpt-oss-120b-GGUF
System

System requirements

Recommended memory guidance for each gpt-oss model.

gpt-oss-20b-GGUF

About 13GB RAM is recommended for optimal performance.

gpt-oss-120b-GGUF

Requires significantly more memory for optimal performance.

Ready to get started?

Install Lemonade and run gpt-oss locally with full privacy in just a few commands.