This article will teach you how to run LLMs on your AMD XDNA 2 NPU on Linux using FastFlowLM. Get set up and then show us what you build!
Date:March 11, 2026
Authors:Lemonade and FastFlowLM contributors
Getting Started
Troubleshooting
Lemonade detected that some additional setup might be needed to get FastFlowLM fully operational on this system. Please review this guide and follow the steps provided.
Announcement
Updates for Linux Support
Today, it is possible to run LLMs and Whisper on the AMD XDNA 2 NPU. This solution is made up of:
Upstream NPU driver in the Linux 7.0+ kernel (with backports for 6.xx kernels).
AMD IRON compiler for XDNA NPUs.
FastFlowLM, a lightweight LLM runtime for AMD NPUs, which today has added Linux support.
Lemonade, ties everything together with a streamlined user experience.
Getting Started
FastFlowLM
FastFlowLM is a lightweight LLM runtime optimized for AMD NPUs. Today, FastFlowLM is adding support for Ubuntu, Arch, and other distros to enable fast, low-power LLMs on Ryzen™ AI PCs that run Linux.
This article will help you:
Understand Linux NPU support status and required platform versions
Install the FLM + driver stack for your distribution
Validate your setup with flm validate
Fix common firmware, driver, and memlock issues
HW Requirements
Supported processors
FastFlowLM on Linux requires an AMD XDNA 2 NPU.
Ryzen AI family
Codename
Status
Max 300-series
Strix Halo
Supported
300-series
Kraken Point, Strix Point
Supported
400-series
Gorgon Point
Supported
Z2 Extreme
Handheld devices
Supported
Note: Ryzen AI 7000/8000/200-series chips have XDNA 1, which is not supported.
SW Requirements
Runtime stack
This solution requires specific firmware, kernel version, driver, and runtime software to function. The quickstart guide below will help you install these requirements.
Item
Requirement
NPU firmware
Version 1.1.0.0 or later
Kernel + driver
Kernel 7.0+ with amdxdna, or amdxdna-dkms
Runtime
FastFlowLM installed
Memlock limit
Must be high enough for NPU execution
Quickstart
Setup by distribution
Select your Linux distribution and follow the exact install path.
If setup does not work, use these checks to isolate the failing layer quickly.
The first thing you should do is run flm validate to check your system's compatibility and identify any issues. A successful validation will look like this:
If flm validate reports firmware issues, ensure version 1.1.0.0 or later.
cat /sys/bus/pci/drivers/amdxdna/*/fw_version
If older, update through your distribution (typically by updating the linux-firmware package).
2. Missing or outdated amdxdna driver
This solution requires amdxdna, provided in Linux kernel 7.0+ or via amdxdna-dkms.
lsmod | grep amdxdna
If not listed, upgrade to a kernel with in-tree support or install the DKMS package.
uname -r
Kernel version matters mainly for in-tree driver support. With DKMS, package version determines the driver.
3. Memlock limit too low
NPU workloads need memory locking. If you see Memlock limit is too low (8MB), increase limits.
Most users do not need manual changes when running Lemonade as a service:
Lemonade's systemd unit sets LimitMEMLOCK=infinity.
If you run FLM outside the service, edit limits manually:
sudo nano /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited
Log out and log back in for the change to apply.
4. AMD IOMMU disabled
If AMD IOMMU is disabled on your system, your NPU will not be detected:
$ flm validate
[Linux] Kernel: 7.0.0-rc2
[ERROR] No NPU device found.
[Linux] Memlock Limit: set to infinity
Make sure amd_iommu=off is not part of your kernel command line.
This is a common setting to change on Strix Halo machines as it slighlty increases
GPU performance in llama.cpp, but unfortunately disables the NPU.