A new Linux GPU backend
Lemonade has added vLLM as a backend for AMD ROCm GPUs on Linux. This release is experimental and usable today, with known rough edges tracked publicly in the vLLM milestone.
vLLM brings two capabilities that complement Lemonade's existing backend lineup:
- Improved day-0 model support. New transformer architectures often become usable directly from Hugging Face checkpoints without a separate porting cycle.
- Concurrency and multi-GPU scaling. vLLM's paged-attention KV cache, continuous batching, chunked prefill, tensor parallelism, and pipeline parallelism help scale throughput across busy local serving workloads.
What's included
The Lemonade vLLM backend is distributed as a self-contained ROCm bundle from lemonade-sdk/vllm-rocm.
Self-contained bundle
Includes a relocatable Python interpreter, PyTorch ROCm, ROCm user-space libraries, Triton, and vLLM.
No host Python stack
No system Python, PyTorch, or ROCm install is required on the host for the backend bundle.
Per-GPU target builds
Lemonade selects the release matching the detected ROCm architecture at install time.
Install and run vLLM ROCm
Install Lemonade Server, add the vLLM ROCm backend, and run the starter Qwen model.
Ubuntu commands are shown below, and Lemonade also supports additional Linux install paths including Snap, Debian, Fedora, Arch, and Docker. See the install guide for the full set of options.
# Install Lemonade Server on Ubuntu
sudo add-apt-repository ppa:lemonade-team/stable
sudo apt install lemonade-server
# Install and run the vLLM ROCm backend
lemonade backends install vllm:rocm
lemonade run Qwen3.5-0.8B-vLLM
The vLLM backend download is large and can take a while. The first model download also takes a few minutes, and first-time users should confirm the Linux kernel prerequisites before installing the backend.
Validated AMD GPU targets
vLLM ROCm support currently focuses on Linux systems with AMD ROCm-capable GPUs.
| GPU target | Status |
|---|---|
| gfx1151 (Strix Halo) | Validated end-to-end |
| gfx1150 (Strix Point) | Validated end-to-end |
| gfx110X (RDNA3) | Prebuilt wheels available; end-to-end validation pending |
| gfx120X (RDNA4) | Prebuilt wheels available; end-to-end validation pending |
Share what works
This experiment is meant to gather community feedback and help us decide whether vLLM ROCm should become a productized Lemonade backend. Try it, compare notes with other users, and tell us where the path feels promising or rough.
Ready to try vLLM ROCm?
Install the experimental backend and run a vLLM recipe through Lemonade's local OpenAI-compatible API.