Axon NPU Architecture

Nordic's in-house ultra-low-power neural processing unit — 128 MHz, 3–8 GOPS, up to 15× faster than CPU TensorFlow Lite execution.

The Axon NPU is a dedicated hardware accelerator for neural-network inference, designed specifically for the always-on, battery-powered constraints of the nRF54L Series. It is built into the silicon alongside the Cortex-M33 application core — not a separate companion chip — so it shares SRAM and is woken / fed by the main CPU through low-overhead drivers.

Key numbers

Parameter	Value
Origin	Atlazo (acquired by Nordic, 2023, San Diego)
Clock	128 MHz
Throughput	3–8 GOPS, workload-dependent
Speedup vs Cortex-M33	up to 15× on TensorFlow Lite / LiteRT models
Performance/watt vs competing edge AI	up to 7× higher performance, 8× better energy efficiency
First host SoC	nRF54LM20B
Future hosts	nRF92 Series (cellular IoT), additional wireless SoCs

"Up to 15×" and "up to 7×/8×" are Nordic-reported figures and are workload-dependent — they assume a model that fits the natively accelerated op set (see below) and uses INT8 quantisation, and the competitive comparison depends on the reference part Nordic chose. Models that fall back to CPU execution will see a smaller speedup. Always benchmark on your own model with the Axon Compiler's per-layer report before sizing a power budget around these numbers.

Natively accelerated operations

The Axon NPU accelerates the operations that dominate inference time in practical edge models:

1D and 2D convolutions
Depthwise convolutions
Fully connected (dense) layers
Pooling layers
Activation functions (the common ones used in quantised models)

Operations outside this set fall back to the Cortex-M33. The Axon Compiler reports exactly which layers can be hardware-accelerated and which will run on the CPU, including a per-layer inference-time estimate.

Quantisation

The NPU is designed for INT8 quantised models. This is the format TensorFlow Lite Micro / LiteRT produces by default with full integer quantisation, and it's what you'll get out of Edge Impulse and the Nordic Edge AI Lab. Float models can be compiled but will not see the full speedup or energy benefit.

The Axon Compiler reports quantisation loss as part of its metrics so you can decide whether the accuracy hit is acceptable for your application.

What it's good for

The Axon NPU was designed for the kind of always-on workloads that have historically been the bottleneck for battery-powered IoT:

Keyword spotting (KWS) and wake-word detection — running a small audio model continuously without draining a coin cell
Audio classification — door knocks, glass break, cough, alarm sounds
Anomaly detection on sensor streams — vibration, current, pressure
Gesture recognition — accelerometer / IMU based
Health-signal classification — PPG, ECG (with appropriate clinical validation; see Medical guidance)

For simpler, very-low-rate sensor analytics, Nordic also supports Neuton custom models — ultra-efficient CPU-run models that don't need the NPU at all.

What it isn't

The Axon NPU is not a general-purpose GPU and will not run modern transformer / LLM workloads. The energy and area budget targets "always-on, sub-1 mW inference" — measured in microjoules per inference, not "frames per second of ResNet-50". Pick the right tool for the job:

Workload	Right target
Wake-word, KWS, IMU classification, anomaly detection	Axon NPU on nRF54LM20B
Vibration / accelerometer analytics with tiny models	Neuton custom models on any nRF54L (CPU)
Vision, transformer inference, on-device LLMs	A different class of part — not Nordic's nRF Series

Where to next

Start building: Development tools
Pick the host part: nRF54LM20B SoC
Walk through a project: Getting started