IBM zDNN (Z Deep Neural Network) is a hardware acceleration library designed specifically to leverage the IBM NNPA (Neural Network Processor Assist) accelerator located within IBM Telum I and II processors. It provides significant performance improvements for neural network inference operations.
The llama.cpp zDNN backend is designed to enable llama.cpp on IBM z17 and later systems via the IBM zDNN hardware acceleration library.
| Hardware Level | Status | Verified |
|---|---|---|
| IBM z17 / LinuxONE 5 | Supported | RHEL 9.6, IBM z17, 40 IFLs |
| IBM z16 / LinuxONE 4 | Not Supported |
| Data Type | Status |
|---|---|
| F32 | Supported |
| F16 | Supported |
| BF16 | Supported |
The IBM zDNN backend has the following CMake options that control the behaviour of the backend.
| CMake Option | Default Value | Description |
|---|---|---|
GGML_ZDNN |
OFF |
Compile llama.cpp with zDNN support |
ZDNN_ROOT |
"" |
Override zDNN library lookup |
Note: Using the zDNN library provided via apt or yum may not work correctly as reported in #15772. It is preferred that you compile from source.
git clone --recurse-submodules https://github.com/IBM/zDNN
cd zDNN
autoreconf .
./configure --prefix=/opt/zdnn-libs
make build
sudo make install
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -S . -G Ninja -B build \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_ZDNN=ON \
-DZDNN_ROOT=/opt/zdnn-libs
cmake --build build --config Release -j$(nproc)