vLLM ROCm now in Lemonade

Announcement

A new Linux GPU backend

Lemonade has added vLLM as a backend for AMD ROCm GPUs on Linux. This release is experimental and usable today, with known rough edges tracked publicly in the vLLM milestone.

vLLM brings two capabilities that complement Lemonade's existing backend lineup:

Improved day-0 model support. New transformer architectures often become usable directly from Hugging Face checkpoints without a separate porting cycle.
Concurrency and multi-GPU scaling. vLLM's paged-attention KV cache, continuous batching, chunked prefill, tensor parallelism, and pipeline parallelism help scale throughput across busy local serving workloads.

Release Notes

What's included

The Lemonade vLLM backend is distributed as a self-contained ROCm bundle from lemonade-sdk/vllm-rocm.

Self-contained bundle

Includes a relocatable Python interpreter, PyTorch ROCm, ROCm user-space libraries, Triton, and vLLM.

No host Python stack

No system Python, PyTorch, or ROCm install is required on the host for the backend bundle.

Per-GPU target builds

Lemonade selects the release matching the detected ROCm architecture at install time.

Quickstart

Install and run vLLM ROCm

Install Lemonade Server, add the vLLM ROCm backend, and run the starter Qwen model.

Ubuntu commands are shown below, and Lemonade also supports additional Linux install paths including Snap, Debian, Fedora, Arch, and Docker. See the install guide for the full set of options.

Linux quickstart

# Install Lemonade Server on Ubuntu
sudo add-apt-repository ppa:lemonade-team/stable
sudo apt install lemonade-server

# Install and run the vLLM ROCm backend
lemonade backends install vllm:rocm
lemonade run Qwen3.5-0.8B-vLLM

The vLLM backend download is large and can take a while. The first model download also takes a few minutes, and first-time users should confirm the Linux kernel prerequisites before installing the backend.

Check Kernel Prerequisites Read vLLM Docs

Hardware

Validated AMD GPU targets

vLLM ROCm support currently focuses on Linux systems with AMD ROCm-capable GPUs.

GPU target	Status
gfx1151 (Strix Halo)	Validated end-to-end
gfx1150 (Strix Point)	Validated end-to-end
gfx110X (RDNA3)	Prebuilt wheels available; end-to-end validation pending
gfx120X (RDNA4)	Prebuilt wheels available; end-to-end validation pending

Feedback

Share what works

This experiment is meant to gather community feedback and help us decide whether vLLM ROCm should become a productized Lemonade backend. Try it, compare notes with other users, and tell us where the path feels promising or rough.

View Known Issues Join the Discord

Ready to try vLLM ROCm?

Install the experimental backend and run a vLLM recipe through Lemonade's local OpenAI-compatible API.

Install Lemonade