🍋 Lemonade Server is a server interface that uses the standard Open AI API, allowing applications to integrate with local LLMs. This means that you can easily replace cloud-based LLMs with private and free LLMs that run locally on your own PC’s NPU and GPU.
Lemonade Server is available as a standalone tool with a one-click Windows GUI installer.
Once you’ve installed, we recommend checking out these resources:
Documentation | Description |
---|---|
Supported Applications | Explore applications that work out-of-the-box with Lemonade Server. |
Lemonade Server Concepts | Background knowledge about local LLM servers and the OpenAI standard. |
lemonade-server CLI Guide |
Learn how to manage the server process and install new models using the command-line interface. |
Models List | Browse a curated set of LLMs available for serving. |
Server Spec | Review all supported OpenAI-compatible and Lemonade-specific API endpoints. |
Integration Guide | Step-by-step instructions for integrating Lemonade Server into your own applications. |
Note: if you want to develop Lemonade Server itself, you can install from source.
Since Lemonade Server implements the standard OpenAI API specification, you can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1
as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
---|---|---|---|---|---|---|---|---|
openai-python | openai-cpp | openai-java | openai-dotnet | openai-node | go-openai | ruby-openai | async-openai | openai-php |
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
For more detailed integration instructions, see the Integration Guide.