|
@@ -308,6 +308,8 @@ In order to build llama.cpp you have three different options.
|
|
|
make
|
|
make
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
|
+ **Note**: for `Debug` builds, run `make LLAMA_DEBUG=1`
|
|
|
|
|
+
|
|
|
- On Windows:
|
|
- On Windows:
|
|
|
|
|
|
|
|
1. Download the latest fortran version of [w64devkit](https://github.com/skeeto/w64devkit/releases).
|
|
1. Download the latest fortran version of [w64devkit](https://github.com/skeeto/w64devkit/releases).
|
|
@@ -322,12 +324,26 @@ In order to build llama.cpp you have three different options.
|
|
|
- Using `CMake`:
|
|
- Using `CMake`:
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake ..
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
|
|
+ cmake -B build
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
|
+ **Note**: for `Debug` builds, there are two cases:
|
|
|
|
|
+
|
|
|
|
|
+ - Single-config generators (e.g. default = `Unix Makefiles`; note that they just ignore the `--config` flag):
|
|
|
|
|
+
|
|
|
|
|
+ ```bash
|
|
|
|
|
+ cmake -B build -DCMAKE_BUILD_TYPE=Debug
|
|
|
|
|
+ cmake --build build
|
|
|
|
|
+ ```
|
|
|
|
|
+
|
|
|
|
|
+ - Multi-config generators (`-G` param set to Visual Studio, XCode...):
|
|
|
|
|
+
|
|
|
|
|
+ ```bash
|
|
|
|
|
+ cmake -B build -G "Xcode"
|
|
|
|
|
+ cmake --build build --config Debug
|
|
|
|
|
+ ```
|
|
|
|
|
+
|
|
|
- Using `Zig` (version 0.11 or later):
|
|
- Using `Zig` (version 0.11 or later):
|
|
|
|
|
|
|
|
Building for optimization levels and CPU features can be accomplished using standard build arguments, for example AVX2, FMA, F16C,
|
|
Building for optimization levels and CPU features can be accomplished using standard build arguments, for example AVX2, FMA, F16C,
|
|
@@ -439,10 +455,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
- Using `CMake` on Linux:
|
|
- Using `CMake` on Linux:
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
|
|
+ cmake -B build -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
- #### BLIS
|
|
- #### BLIS
|
|
@@ -462,11 +476,9 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
- Using manual oneAPI installation:
|
|
- Using manual oneAPI installation:
|
|
|
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
|
|
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
|
|
|
```bash
|
|
```bash
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
|
|
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
|
|
|
- cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
|
|
+ cmake -B build -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
- Using oneAPI docker image:
|
|
- Using oneAPI docker image:
|
|
@@ -487,10 +499,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
- Using `CMake`:
|
|
- Using `CMake`:
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake .. -DLLAMA_CUDA=ON
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
|
|
+ cmake -B build -DLLAMA_CUDA=ON
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance:
|
|
The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance:
|
|
@@ -517,8 +527,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
|
|
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
|
|
|
```bash
|
|
```bash
|
|
|
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ \
|
|
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ \
|
|
|
- cmake -H. -Bbuild -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
|
|
|
|
|
- && cmake --build build -- -j 16
|
|
|
|
|
|
|
+ cmake -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
|
|
|
|
|
+ && cmake --build build --config Release -- -j 16
|
|
|
```
|
|
```
|
|
|
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON"`.
|
|
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DLLAMA_HIP_UMA=ON"`.
|
|
|
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
|
|
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
|
|
@@ -564,15 +574,14 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
|
|
|
|
|
```sh
|
|
```sh
|
|
|
git clone --recurse-submodules https://github.com/KhronosGroup/OpenCL-SDK.git
|
|
git clone --recurse-submodules https://github.com/KhronosGroup/OpenCL-SDK.git
|
|
|
- mkdir OpenCL-SDK/build
|
|
|
|
|
- cd OpenCL-SDK/build
|
|
|
|
|
- cmake .. -DBUILD_DOCS=OFF \
|
|
|
|
|
|
|
+ cd OpenCL-SDK
|
|
|
|
|
+ cmake -B build -DBUILD_DOCS=OFF \
|
|
|
-DBUILD_EXAMPLES=OFF \
|
|
-DBUILD_EXAMPLES=OFF \
|
|
|
-DBUILD_TESTING=OFF \
|
|
-DBUILD_TESTING=OFF \
|
|
|
-DOPENCL_SDK_BUILD_SAMPLES=OFF \
|
|
-DOPENCL_SDK_BUILD_SAMPLES=OFF \
|
|
|
-DOPENCL_SDK_TEST_SAMPLES=OFF
|
|
-DOPENCL_SDK_TEST_SAMPLES=OFF
|
|
|
- cmake --build . --config Release
|
|
|
|
|
- cmake --install . --prefix /some/path
|
|
|
|
|
|
|
+ cmake --build build
|
|
|
|
|
+ cmake --install build --prefix /some/path
|
|
|
```
|
|
```
|
|
|
</details>
|
|
</details>
|
|
|
|
|
|
|
@@ -594,23 +603,23 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
```cmd
|
|
```cmd
|
|
|
set OPENCL_SDK_ROOT="C:/OpenCL-SDK-v2023.04.17-Win-x64"
|
|
set OPENCL_SDK_ROOT="C:/OpenCL-SDK-v2023.04.17-Win-x64"
|
|
|
git clone https://github.com/CNugteren/CLBlast.git
|
|
git clone https://github.com/CNugteren/CLBlast.git
|
|
|
- mkdir CLBlast\build
|
|
|
|
|
- cd CLBlast\build
|
|
|
|
|
- cmake .. -DBUILD_SHARED_LIBS=OFF -DOVERRIDE_MSVC_FLAGS_TO_MT=OFF -DTUNERS=OFF -DOPENCL_ROOT=%OPENCL_SDK_ROOT% -G "Visual Studio 17 2022" -A x64
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
- cmake --install . --prefix C:/CLBlast
|
|
|
|
|
|
|
+ cd CLBlast
|
|
|
|
|
+ cmake -B build -DBUILD_SHARED_LIBS=OFF -DOVERRIDE_MSVC_FLAGS_TO_MT=OFF -DTUNERS=OFF -DOPENCL_ROOT=%OPENCL_SDK_ROOT% -G "Visual Studio 17 2022" -A x64
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
|
|
+ cmake --install build --prefix C:/CLBlast
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
|
+ (note: `--config Release` at build time is the default and only relevant for Visual Studio builds - or multi-config Ninja builds)
|
|
|
|
|
+
|
|
|
- <details>
|
|
- <details>
|
|
|
<summary>Unix:</summary>
|
|
<summary>Unix:</summary>
|
|
|
|
|
|
|
|
```sh
|
|
```sh
|
|
|
git clone https://github.com/CNugteren/CLBlast.git
|
|
git clone https://github.com/CNugteren/CLBlast.git
|
|
|
- mkdir CLBlast/build
|
|
|
|
|
- cd CLBlast/build
|
|
|
|
|
- cmake .. -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
- cmake --install . --prefix /some/path
|
|
|
|
|
|
|
+ cd CLBlast
|
|
|
|
|
+ cmake -B build -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
|
|
+ cmake --install build --prefix /some/path
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
Where `/some/path` is where the built library will be installed (default is `/usr/local`).
|
|
Where `/some/path` is where the built library will be installed (default is `/usr/local`).
|
|
@@ -624,21 +633,17 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
```
|
|
```
|
|
|
- CMake (Unix):
|
|
- CMake (Unix):
|
|
|
```sh
|
|
```sh
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake .. -DLLAMA_CLBLAST=ON -DCLBlast_DIR=/some/path
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
|
|
+ cmake -B build -DLLAMA_CLBLAST=ON -DCLBlast_DIR=/some/path
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
```
|
|
```
|
|
|
- CMake (Windows):
|
|
- CMake (Windows):
|
|
|
```cmd
|
|
```cmd
|
|
|
set CL_BLAST_CMAKE_PKG="C:/CLBlast/lib/cmake/CLBlast"
|
|
set CL_BLAST_CMAKE_PKG="C:/CLBlast/lib/cmake/CLBlast"
|
|
|
git clone https://github.com/ggerganov/llama.cpp
|
|
git clone https://github.com/ggerganov/llama.cpp
|
|
|
cd llama.cpp
|
|
cd llama.cpp
|
|
|
- mkdir build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake .. -DBUILD_SHARED_LIBS=OFF -DLLAMA_CLBLAST=ON -DCMAKE_PREFIX_PATH=%CL_BLAST_CMAKE_PKG% -G "Visual Studio 17 2022" -A x64
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
- cmake --install . --prefix C:/LlamaCPP
|
|
|
|
|
|
|
+ cmake -B build -DBUILD_SHARED_LIBS=OFF -DLLAMA_CLBLAST=ON -DCMAKE_PREFIX_PATH=%CL_BLAST_CMAKE_PKG% -G "Visual Studio 17 2022" -A x64
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
|
|
+ cmake --install build --prefix C:/LlamaCPP
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
##### Running Llama with CLBlast
|
|
##### Running Llama with CLBlast
|
|
@@ -694,10 +699,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|
|
Then, build llama.cpp using the cmake command below:
|
|
Then, build llama.cpp using the cmake command below:
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
|
- mkdir -p build
|
|
|
|
|
- cd build
|
|
|
|
|
- cmake .. -DLLAMA_VULKAN=1
|
|
|
|
|
- cmake --build . --config Release
|
|
|
|
|
|
|
+ cmake -B build -DLLAMA_VULKAN=1
|
|
|
|
|
+ cmake --build build --config Release
|
|
|
# Test the output binary (with "-ngl 33" to offload all layers to GPU)
|
|
# Test the output binary (with "-ngl 33" to offload all layers to GPU)
|
|
|
./bin/main -m "PATH_TO_MODEL" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4
|
|
./bin/main -m "PATH_TO_MODEL" -p "Hi you how are you" -n 50 -e -ngl 33 -t 4
|
|
|
|
|
|