|
@@ -528,13 +528,28 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
```
|
|
```
|
|
|
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
|
|
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
|
|
|
```bash
|
|
```bash
|
|
|
- CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ \
|
|
|
|
|
- cmake -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
|
|
|
|
|
|
|
+ HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
|
|
|
|
|
+ cmake -S . -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
|
|
|
&& cmake --build build --config Release -- -j 16
|
|
&& cmake --build build --config Release -- -j 16
|
|
|
```
|
|
```
|
|
|
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON`.
|
|
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON`.
|
|
|
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
|
|
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
|
|
|
|
|
|
|
|
|
|
+ Note that if you get the following error:
|
|
|
|
|
+ ```
|
|
|
|
|
+ clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
|
|
|
|
|
+ ```
|
|
|
|
|
+ Try searching for a directory under `HIP_PATH` that contains the file
|
|
|
|
|
+ `oclc_abi_version_400.bc`. Then, add the following to the start of the
|
|
|
|
|
+ command: `HIP_DEVICE_LIB_PATH=<directory-you-just-found>`, so something
|
|
|
|
|
+ like:
|
|
|
|
|
+ ```bash
|
|
|
|
|
+ HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)" \
|
|
|
|
|
+ HIP_DEVICE_LIB_PATH=<directory-you-just-found> \
|
|
|
|
|
+ cmake -S . -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
|
|
|
|
|
+ && cmake --build build -- -j 16
|
|
|
|
|
+ ```
|
|
|
|
|
+
|
|
|
- Using `make` (example for target gfx1030, build with 16 CPU threads):
|
|
- Using `make` (example for target gfx1030, build with 16 CPU threads):
|
|
|
```bash
|
|
```bash
|
|
|
make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030
|
|
make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030
|
|
@@ -543,10 +558,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
- Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS, and assuming a gfx1100-compatible AMD GPU):
|
|
- Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS, and assuming a gfx1100-compatible AMD GPU):
|
|
|
```bash
|
|
```bash
|
|
|
set PATH=%HIP_PATH%\bin;%PATH%
|
|
set PATH=%HIP_PATH%\bin;%PATH%
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake -G Ninja -DAMDGPU_TARGETS=gfx1100 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ..
|
|
|
|
|
- cmake --build .
|
|
|
|
|
|
|
+ cmake -S . -B build -G Ninja -DAMDGPU_TARGETS=gfx1100 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release
|
|
|
|
|
+ cmake --build build
|
|
|
```
|
|
```
|
|
|
Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
|
|
Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
|
|
|
Find your gpu version string by matching the most significant version information from `rocminfo | grep gfx | head -1 | awk '{print $2}'` with the list of processors, e.g. `gfx1035` maps to `gfx1030`.
|
|
Find your gpu version string by matching the most significant version information from `rocminfo | grep gfx | head -1 | awk '{print $2}'` with the list of processors, e.g. `gfx1035` maps to `gfx1030`.
|