FirmwareMaestro Docs
Edge AI

Axon NPU Architecture

Nordic's in-house ultra-low-power neural processing unit — 128 MHz, 3–8 GOPS, up to 15× faster than CPU TensorFlow Lite execution.

The Axon NPU is a dedicated hardware accelerator for neural-network inference, designed specifically for the always-on, battery-powered constraints of the nRF54L Series. It is built into the silicon alongside the Cortex-M33 application core — not a separate companion chip — so it shares SRAM and is woken / fed by the main CPU through low-overhead drivers.

Key numbers

ParameterValue
OriginAtlazo (acquired by Nordic, 2023, San Diego)
Clock128 MHz
Throughput3–8 GOPS, workload-dependent
Speedup vs Cortex-M33up to 15× on TensorFlow Lite / LiteRT models
Performance/watt vs competing edge AIup to 7× higher performance, 8× better energy efficiency
First host SoCnRF54LM20B
Future hostsnRF92 Series (cellular IoT), additional wireless SoCs

"Up to 15×" and "up to 7×/8×" are Nordic-reported figures and are workload-dependent — they assume a model that fits the natively accelerated op set (see below) and uses INT8 quantisation, and the competitive comparison depends on the reference part Nordic chose. Models that fall back to CPU execution will see a smaller speedup. Always benchmark on your own model with the Axon Compiler's per-layer report before sizing a power budget around these numbers.

Natively accelerated operations

The Axon NPU accelerates the operations that dominate inference time in practical edge models:

  • 1D and 2D convolutions
  • Depthwise convolutions
  • Fully connected (dense) layers
  • Pooling layers
  • Activation functions (the common ones used in quantised models)

Operations outside this set fall back to the Cortex-M33. The Axon Compiler reports exactly which layers can be hardware-accelerated and which will run on the CPU, including a per-layer inference-time estimate.

Quantisation

The NPU is designed for INT8 quantised models. This is the format TensorFlow Lite Micro / LiteRT produces by default with full integer quantisation, and it's what you'll get out of Edge Impulse and the Nordic Edge AI Lab. Float models can be compiled but will not see the full speedup or energy benefit.

The Axon Compiler reports quantisation loss as part of its metrics so you can decide whether the accuracy hit is acceptable for your application.

What it's good for

The Axon NPU was designed for the kind of always-on workloads that have historically been the bottleneck for battery-powered IoT:

  • Keyword spotting (KWS) and wake-word detection — running a small audio model continuously without draining a coin cell
  • Audio classification — door knocks, glass break, cough, alarm sounds
  • Anomaly detection on sensor streams — vibration, current, pressure
  • Gesture recognition — accelerometer / IMU based
  • Health-signal classification — PPG, ECG (with appropriate clinical validation; see Medical guidance)

For simpler, very-low-rate sensor analytics, Nordic also supports Neuton custom models — ultra-efficient CPU-run models that don't need the NPU at all.

What it isn't

The Axon NPU is not a general-purpose GPU and will not run modern transformer / LLM workloads. The energy and area budget targets "always-on, sub-1 mW inference" — measured in microjoules per inference, not "frames per second of ResNet-50". Pick the right tool for the job:

WorkloadRight target
Wake-word, KWS, IMU classification, anomaly detectionAxon NPU on nRF54LM20B
Vibration / accelerometer analytics with tiny modelsNeuton custom models on any nRF54L (CPU)
Vision, transformer inference, on-device LLMsA different class of part — not Nordic's nRF Series

Where to next

On this page