Running Lemonade in Docker
Quick Start with Docker
You may need additional configuration depending on your environment.
Docker Run with Default Configuration
docker run -d \
--name lemonade-server \
-p 13305:13305 \
-v lemonade-cache:/root/.cache/huggingface \
-v lemonade-llama:/opt/lemonade/llama \
-v lemonade-recipe:/root/.cache/lemonade \
ghcr.io/lemonade-sdk/lemonade-server:latest
Docker Run with a Specific Port and Version
docker run -d \
--name lemonade-server \
-p 4000:5000 \
-v lemonade-cache:/root/.cache/huggingface \
-v lemonade-llama:/opt/lemonade/llama \
-v lemonade-recipe:/root/.cache/lemonade \
-e LEMONADE_LLAMACPP=cpu \
ghcr.io/lemonade-sdk/lemonade-server:v9.1.3 \
./lemond --host 0.0.0.0 --port 5000
This will run the server on port 5000 inside the container, mapped to port 4000 on your host.
Docker Run with CPU backend
docker run -d \
--name lemonade-server \
-p 13305:13305 \
-v lemonade-cache:/root/.cache/huggingface \
-v lemonade-llama:/opt/lemonade/llama \
-v lemonade-recipe:/root/.cache/lemonade \
-e LEMONADE_LLAMACPP=cpu \
ghcr.io/lemonade-sdk/lemonade-server:latest
Docker Run with AMD GPU Passthrough using ROCm
docker run -d \
--name lemonade-server \
-p 13305:13305 \
-v lemonade-cache:/root/.cache/huggingface \
-v lemonade-llama:/opt/lemonade/llama \
-v lemonade-recipe:/root/.cache/lemonade \
-e LEMONADE_LLAMACPP=rocm \
--device=/dev/kfd \
--device=/dev/dri \
ghcr.io/lemonade-sdk/lemonade-server:latest
This will run the server using the ROCm backend as the default for llama.cpp.
Other Docker Methods
Docker Compose Setup
Docker Compose makes it easier to manage multi-container applications.
1. Make sure you have Docker Compose installed.
2. Create a docker-compose.yml file like this:
services:
lemonade:
image: ghcr.io/lemonade-sdk/lemonade-server:latest
container_name: lemonade-server
ports:
- "13305:13305"
volumes:
# Persist downloaded models
- lemonade-cache:/root/.cache/huggingface
# Persist llama binaries
- lemonade-llama:/opt/lemonade/llama
# Persist model options and other backend binaries
- lemonade-recipe:/root/.cache/lemonade
environment:
- LEMONADE_LLAMACPP=cpu
restart: unless-stopped
volumes:
lemonade-cache:
lemonade-llama:
lemonade-recipe:
You can add more services as needed, or add host devices for the ROCM backend.
- Run the following command in the directory containing your docker-compose.yml:
docker-compose up -d
This will pull the latest image (or the version you specified) from the Lemonade container registry and start the server with your mapped ports.
Once the container is running, verify it’s working:
curl http://localhost:13305/api/v1/models
You should receive a response listing available models.
Build Your Own Docker Image
Documentation below shows container based workflows and how to build your own environments if needed.
Container-based workflows
This repository supports two container-related workflows with different goals:
Development (Dev Containers)
The .devcontainer (dev container) configuration is intended for contributors and developers.
It provides a full development environment (tooling, debuggers, source mounted)
and is primarily used with VS Code Dev Containers or GitHub Codespaces.
Running Lemonade in a container
The Dockerfile and docker-compose.yml guide provided here are intended for running
Lemonade as an application in a containerized environment. This uses a
multi-stage build to produce a minimal runtime image, similar in spirit to the
MSI-based distribution, but containerized.
These workflows are complementary and serve different use cases.
Lemonade C++ Docker Setup
This guide explains how to build and run Lemonade C++ in a Docker container using Docker Compose. The setup includes persistent caching for HuggingFace models.
If you want to pull or use a specific Lemonade Docker image instead of building your own, check out the instructions in
README.md
Prerequisites
- Docker >= 24.x
- Docker Compose >= 2.x
- At least 8 GB RAM and 4 CPU cores recommended for small models
- Internet access to download model files from HuggingFace
1. Docker File
The Dockerfile below uses a multi-stage build to compile Lemonade C++ components and produce a clean, lightweight runtime image.
Place the Dockerfile in the parent directory of the repository root when building.
Build context note
This guide assumes the Dockerfile and
docker-compose.ymllive outside the Lemonade repository directory. Like belowIf you place them inside the repository, update the Dockerfile to use. ├── docker-compose.yml ├── Dockerfile └── lemonade/ ├── src ├── docs ├── .devcontainer └── ...COPY . /appinstead.
This configuration has been tested with Vulkan, ROCM, and CPU backends and you can modify or extend it to suit your specific deployment needs.
# ==============================================================
# # 1. Build stage — compile lemonade C++ binaries
# # ============================================================
FROM ubuntu:24.04 AS builder
# Avoid interactive prompts during build
ENV DEBIAN_FRONTEND=noninteractive
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
cmake \
libssl-dev \
pkg-config \
git \
&& rm -rf /var/lib/apt/lists/*
# Copy source code
COPY lemonade /app
WORKDIR /app/
# Build the project
RUN rm -rf build && \
mkdir -p build && \
cd build && \
cmake .. && \
cmake --build . --config Release -j"$(nproc)"
# Debug: Check build outputs
RUN echo "=== Build directory contents ===" && \
ls -la build/ && \
echo "=== Checking for resources ===" && \
find build/ -name "*.json" -o -name "resources" -type d
# # ============================================================
# # 2. Runtime stage — small, clean image
# # ============================================================
FROM ubuntu:24.04
# Install runtime dependencies only
RUN apt-get update && apt-get install -y \
libcurl4 \
curl \
libssl3 \
zlib1g \
vulkan-tools \
libvulkan1 \
unzip \
libgomp1 \
libatomic1 \
&& rm -rf /var/lib/apt/lists/*
# Create application directory
WORKDIR /opt/lemonade
# Copy built executables and resources from builder
COPY --from=builder /app/build/lemond ./lemond
COPY --from=builder /app/build/lemonade-server ./lemonade-server
COPY --from=builder /app/build/resources ./resources
# Make executables executable
RUN chmod +x ./lemond ./lemonade-server
# Create necessary directories
RUN mkdir -p /opt/lemonade/llama/cpu \
/opt/lemonade/llama/vulkan \
/root/.cache/huggingface
# Expose default port
EXPOSE 13305
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:13305/live || exit 1
# Default command: start server in headless mode
CMD ["./lemond", "--host", "0.0.0.0"]
2. Build the Docker Image
Create below docker-compose.yml file within the parent directory of repository root (where Dockerfile is located):
services:
lemonade:
build:
context: .
dockerfile: Dockerfile
container_name: lemonade-server
ports:
- "13305:13305"
volumes:
# Persist downloaded models
- lemonade-cache:/root/.cache/huggingface
# Persist llama binaries
- lemonade-llama:/opt/lemonade/llama
# Persist model options and other backend binaries
- lemonade-recipe:/root/.cache/lemonade
environment:
- LEMONADE_LLAMACPP=cpu
restart: unless-stopped
volumes:
lemonade-cache:
lemonade-llama:
lemonade-recipe:
Now run below command within the same directory:
docker-compose build
This will:
- Compile Lemonade C++ (lemonade-server and lemond)
- Prepare a runtime image with all dependencies
3. Run the Container
Start the container with Docker Compose:
docker-compose up -d
- The API will be exposed on port 13305
- HuggingFace models will be cached in the lemonade-cache volume
- LLaMA binaries are persisted in lemonade-llama volume
Check that the server is running:
docker logs -f lemonade-server
You should see:
lemonade-server | Lemonade Server vx.x.x started on port 13305
lemonade-server | Chat and manage models: http://localhost:13305
4. Access the API
Test the API:
curl http://localhost:13305/api/v1/models
You should get a response with available models.
5. Load a Model
You can use the gui on localhost:13305 or below command to load a model (e.g., Qwen 0.6B):
curl -X POST http://localhost:13305/api/v1/load \
-H "Content-Type: application/json" \
-d '{"model_name": "Qwen3-0.6B-GGUF"}'
The server will: - Auto-download the GGUF model from HuggingFace - Install the backend - Make the model ready for inference
6. Make a Chat Request
Once the model is loaded:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:13305/api/v1",
api_key="lemonade" # required but unused
)
completion = client.chat.completions.create(
model="Qwen3-0.6B-GGUF",
messages=[{"role": "user", "content": "Hello, Lemonade!"}]
)
print(completion.choices[0].message.content)
7. Stopping the Server
docker-compose down
- Keeps cached models and binaries in Docker volumes
- You can restart anytime with docker-compose up -d
8. Troubleshooting
Server not starting: Check logs with:
docker logs lemonade-server
If you want to view the logs on the web UI, you need to expose the websocket port as well:
docker run -d \
--name lemonade-server \
-p 13305:13305 \
-p 9000:9000 \
-v lemonade-cache:/root/.cache/huggingface \
-v lemonade-llama:/opt/lemonade/llama \
ghcr.io/lemonade-sdk/lemonade-server:latest
- Model download fails: Ensure /root/.cache/huggingface volume is writable
- Vulkan errors on CPU-only machine: The server will fallback to CPU backend automatically