|
@@ -432,14 +432,15 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
```bash
|
|
```bash
|
|
|
make LLAMA_HIPBLAS=1
|
|
make LLAMA_HIPBLAS=1
|
|
|
```
|
|
```
|
|
|
- - Using `CMake` for Linux:
|
|
|
|
|
|
|
+ - Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
|
|
|
```bash
|
|
```bash
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON
|
|
|
|
|
- cmake --build .
|
|
|
|
|
|
|
+ CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ \
|
|
|
|
|
+ cmake -H. -Bbuild -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
|
|
|
|
|
+ && cmake --build build -- -j 16
|
|
|
```
|
|
```
|
|
|
- - Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS):
|
|
|
|
|
|
|
+ On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON"`.
|
|
|
|
|
+ However, this hurts performance for non-integrated GPUs.
|
|
|
|
|
+ - Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS, and assuming a gfx1100-compatible AMD GPU):
|
|
|
```bash
|
|
```bash
|
|
|
set PATH=%HIP_PATH%\bin;%PATH%
|
|
set PATH=%HIP_PATH%\bin;%PATH%
|
|
|
mkdir build
|
|
mkdir build
|
|
@@ -448,10 +449,11 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
cmake --build .
|
|
cmake --build .
|
|
|
```
|
|
```
|
|
|
Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
|
|
Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
|
|
|
|
|
+ Find your gpu version string by matching the most significant version information from `rocminfo | grep gfx | head -1 | awk '{print $2}'` with the list of processors, e.g. `gfx1035` maps to `gfx1030`.
|
|
|
|
|
|
|
|
|
|
|
|
|
The environment variable [`HIP_VISIBLE_DEVICES`](https://rocm.docs.amd.com/en/latest/understand/gpu_isolation.html#hip-visible-devices) can be used to specify which GPU(s) will be used.
|
|
The environment variable [`HIP_VISIBLE_DEVICES`](https://rocm.docs.amd.com/en/latest/understand/gpu_isolation.html#hip-visible-devices) can be used to specify which GPU(s) will be used.
|
|
|
- If your GPU is not officially supported you can use the environment variable [`HSA_OVERRIDE_GFX_VERSION`] set to a similar GPU, for example 10.3.0 on RDNA2 or 11.0.0 on RDNA3.
|
|
|
|
|
|
|
+ If your GPU is not officially supported you can use the environment variable [`HSA_OVERRIDE_GFX_VERSION`] set to a similar GPU, for example 10.3.0 on RDNA2 (e.g. gfx1030, gfx1031, or gfx1035) or 11.0.0 on RDNA3.
|
|
|
The following compilation options are also available to tweak performance (yes, they refer to CUDA, not HIP, because it uses the same code as the cuBLAS version above):
|
|
The following compilation options are also available to tweak performance (yes, they refer to CUDA, not HIP, because it uses the same code as the cuBLAS version above):
|
|
|
|
|
|
|
|
| Option | Legal values | Default | Description |
|
|
| Option | Legal values | Default | Description |
|