🚀 New Models

Run OpenAI's gpt-oss locally with Lemonade

📅 August 12, 2025
✍️ Daniel Holanda, Jeremy Fowers, Krishna Sivakumar, Victoria Godsoe
📖 Blog Post

What is Lemonade?

Whether you're looking for efficient local inference or want to experiment with OpenAI's latest open-source technology, Lemonade makes it incredibly easy to get started with gpt-oss models right out of the box.

Lemonade is a local LLM serving platform that focuses on maximizing performance using the best available hardware acceleration - from neural processing units (NPUs) to GPU acceleration. With Lemonade, you can run large language models entirely on your PC while maintaining full privacy and control over your data.

🧠 Multi-hardware support

CPU, GPU, and NPU acceleration

🔌 OpenAI-compatible API

Drop-in replacement for OpenAI's API

🎯 Easy model management

Built-in model library with one-command installation

🖥️ Cross-platform

Windows and Linux support

🔒 Privacy-first

Everything runs locally on your machine

Available gpt-oss Models

Lemonade supports both gpt-oss model sizes, each optimized for different use cases:

gpt-oss-20b Optimized

Optimized for lower latency and local use cases

Total Parameters: 21B
Active Parameters: 3.6B

Perfect for everyday tasks and quick responses

gpt-oss-120b Production

Production-ready model for high reasoning tasks

Total Parameters: 117B
Active Parameters: 5.1B

Ideal for complex reasoning and advanced applications

Advanced Features

Both models feature OpenAI's innovative sliding window attention and attention sink mechanisms, allowing them to handle extremely long conversations and contexts efficiently while maintaining response quality.

Getting Started

Installation

Install Lemonade using pip:

Terminal / Command Prompt
conda create -n lemon python=3.10
conda activate lemon
pip install lemonade-sdk

Or download our GUI installer for Windows.

Running gpt-oss Models

To run the smaller gpt-oss model:

Run gpt-oss-20b
lemonade-server run gpt-oss-20b-GGUF

For the larger model:

Run gpt-oss-120b
lemonade-server run gpt-oss-120b-GGUF

You can also install models ahead of time:

Pre-install Models
lemonade-server pull gpt-oss-20b-GGUF
lemonade-server pull gpt-oss-120b-GGUF

System Requirements

💾 gpt-oss-20b-GGUF

~13GB RAM recommended for optimal performance

🚀 gpt-oss-120b-GGUF

Significantly more memory required for optimal performance

Ready to Get Started?

Join thousands of developers running OpenAI's gpt-oss models locally with Lemonade!

🍋 Experience the power of OpenAI's latest models running entirely on your local hardware!

← Back to News