Getting Started with Lemonade Server
🍋 Lemonade Server is a server interface that uses the standard Open AI API, allowing applications to integrate with local LLMs. This means that you can easily replace cloud-based LLMs with private and free LLMs that run locally on your own PC's NPU and GPU.
Lemonade Server is available as a standalone tool with a one-click Windows GUI installer.
Once you've installed, we recommend checking out these resources:
Documentation | Description |
---|---|
Supported Applications | Explore applications that work out-of-the-box with Lemonade Server. |
Lemonade Server Concepts | Background knowledge about local LLM servers and the OpenAI standard. |
lemonade-server CLI Guide |
Learn how to manage the server process and install new models using the command-line interface. |
Models List | Browse a curated set of LLMs available for serving. |
Server Spec | Review all supported OpenAI-compatible and Lemonade-specific API endpoints. |
Integration Guide | Step-by-step instructions for integrating Lemonade Server into your own applications. |
Note: if you want to develop Lemonade Server itself, you can install from source.
Integrate Lemonade Server with Your Application
Since Lemonade Server implements the standard OpenAI API specification, you can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1
as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
---|---|---|---|---|---|---|---|---|
openai-python | openai-cpp | openai-java | openai-dotnet | openai-node | go-openai | ruby-openai | async-openai | openai-php |
Python Client Example
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
For more detailed integration instructions, see the Integration Guide.