1
0
Эх сурвалжийг харах

[SYCL] update guide of SYCL backend (#5254)

* update guide for make installation, memory, gguf model link,  rm todo for windows build

* add vs install requirement

* update for gpu device check

* update help of llama-bench

* fix grammer issues
Neo Zhang Jianyu 1 жил өмнө
parent
commit
af3ba5d946

+ 55 - 9
README-sycl.md

@@ -42,6 +42,8 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
 
 
 ## Intel GPU
 ## Intel GPU
 
 
+### Verified
+
 |Intel GPU| Status | Verified Model|
 |Intel GPU| Status | Verified Model|
 |-|-|-|
 |-|-|-|
 |Intel Data Center Max Series| Support| Max 1550|
 |Intel Data Center Max Series| Support| Max 1550|
@@ -50,6 +52,17 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
 |Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
 |Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
 |Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
 |Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
 
 
+Note: If the EUs (Execution Unit) in iGPU is less than 80, the inference speed will be too slow to use.
+
+### Memory
+
+The memory is a limitation to run LLM on GPUs.
+
+When run llama.cpp, there is print log to show the applied memory on GPU. You could know how much memory to be used in your case. Like `llm_load_tensors:            buffer size =  3577.56 MiB`.
+
+For iGPU, please make sure the shared memory from host memory is enough. For llama-2-7b.Q4_0, recommend the host memory is 8GB+.
+
+For dGPU, please make sure the device memory is enough. For llama-2-7b.Q4_0, recommend the device memory is 4GB+.
 
 
 ## Linux
 ## Linux
 
 
@@ -105,7 +118,7 @@ source /opt/intel/oneapi/setvars.sh
 sycl-ls
 sycl-ls
 ```
 ```
 
 
-There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
+There should be one or more level-zero devices. Please confirm that at least one GPU is present, like **[ext_oneapi_level_zero:gpu:0]**.
 
 
 Output (example):
 Output (example):
 ```
 ```
@@ -152,6 +165,8 @@ Note:
 
 
 1. Put model file to folder **models**
 1. Put model file to folder **models**
 
 
+You could download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) as example.
+
 2. Enable oneAPI running environment
 2. Enable oneAPI running environment
 
 
 ```
 ```
@@ -223,7 +238,13 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
 
 
 Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
 Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
 
 
-2. Install Intel® oneAPI Base toolkit.
+Note: **The driver is mandatory for compute function**.
+
+2. Install Visual Studio.
+
+Please install [Visual Studio](https://visualstudio.microsoft.com/) which impact oneAPI environment enabling in Windows.
+
+3. Install Intel® oneAPI Base toolkit.
 
 
 a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
 a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
 
 
@@ -252,7 +273,7 @@ In oneAPI command line:
 sycl-ls
 sycl-ls
 ```
 ```
 
 
-There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
+There should be one or more level-zero devices. Please confirm that at least one GPU is present, like **[ext_oneapi_level_zero:gpu:0]**.
 
 
 Output (example):
 Output (example):
 ```
 ```
@@ -260,15 +281,21 @@ Output (example):
 [opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
 [opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
 [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [31.0.101.5186]
 [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [31.0.101.5186]
 [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
 [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
-
 ```
 ```
 
 
-3. Install cmake & make
+4. Install cmake & make
+
+a. Download & install cmake for Windows: https://cmake.org/download/
 
 
-a. Download & install cmake for windows: https://cmake.org/download/
+b. Download & install make for Windows provided by mingw-w64
 
 
-b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/
+- Download binary package for Windows in https://github.com/niXman/mingw-builds-binaries/releases.
 
 
+  Like [x86_64-13.2.0-release-win32-seh-msvcrt-rt_v11-rev1.7z](https://github.com/niXman/mingw-builds-binaries/releases/download/13.2.0-rt_v11-rev1/x86_64-13.2.0-release-win32-seh-msvcrt-rt_v11-rev1.7z).
+
+- Unzip the binary package. In the **bin** sub-folder and rename **xxx-make.exe** to **make.exe**.
+
+- Add the **bin** folder path in the Windows system PATH environment.
 
 
 ### Build locally:
 ### Build locally:
 
 
@@ -309,6 +336,8 @@ Note:
 
 
 1. Put model file to folder **models**
 1. Put model file to folder **models**
 
 
+You could download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) as example.
+
 2. Enable oneAPI running environment
 2. Enable oneAPI running environment
 
 
 - In Search, input 'oneAPI'.
 - In Search, input 'oneAPI'.
@@ -419,8 +448,25 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
 
 
   Miss to enable oneAPI running environment.
   Miss to enable oneAPI running environment.
 
 
-## Todo
+- Meet compile error.
+
+  Remove folder **build** and try again.
+
+- I can **not** see **[ext_oneapi_level_zero:gpu:0]** afer install GPU driver in Linux.
 
 
-- Support to build in Windows.
+  Please run **sudo sycl-ls**.
+
+  If you see it in result, please add video/render group to your ID:
+
+  ```
+  sudo usermod -aG render username
+  sudo usermod -aG video username
+  ```
+
+  Then **relogin**.
+
+  If you do not see it, please check the installation GPU steps again.
+
+## Todo
 
 
 - Support multiple cards.
 - Support multiple cards.

+ 21 - 13
examples/llama-bench/README.md

@@ -23,19 +23,23 @@ usage: ./llama-bench [options]
 
 
 options:
 options:
   -h, --help
   -h, --help
-  -m, --model <filename>            (default: models/7B/ggml-model-q4_0.gguf)
-  -p, --n-prompt <n>                (default: 512)
-  -n, --n-gen <n>                   (default: 128)
-  -b, --batch-size <n>              (default: 512)
-  --memory-f32 <0|1>                (default: 0)
-  -t, --threads <n>                 (default: 16)
-  -ngl N, --n-gpu-layers <n>        (default: 99)
-  -mg i, --main-gpu <i>             (default: 0)
-  -mmq, --mul-mat-q <0|1>           (default: 1)
-  -ts, --tensor_split <ts0/ts1/..>
-  -r, --repetitions <n>             (default: 5)
-  -o, --output <csv|json|md|sql>    (default: md)
-  -v, --verbose                     (default: 0)
+  -m, --model <filename>              (default: models/7B/ggml-model-q4_0.gguf)
+  -p, --n-prompt <n>                  (default: 512)
+  -n, --n-gen <n>                     (default: 128)
+  -b, --batch-size <n>                (default: 512)
+  -ctk <t>, --cache-type-k <t>        (default: f16)
+  -ctv <t>, --cache-type-v <t>        (default: f16)
+  -t, --threads <n>                   (default: 112)
+  -ngl, --n-gpu-layers <n>            (default: 99)
+  -sm, --split-mode <none|layer|row>  (default: layer)
+  -mg, --main-gpu <i>                 (default: 0)
+  -nkvo, --no-kv-offload <0|1>        (default: 0)
+  -mmp, --mmap <0|1>                  (default: 1)
+  -mmq, --mul-mat-q <0|1>             (default: 1)
+  -ts, --tensor_split <ts0/ts1/..>    (default: 0)
+  -r, --repetitions <n>               (default: 5)
+  -o, --output <csv|json|md|sql>      (default: md)
+  -v, --verbose                       (default: 0)
 
 
 Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
 Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
 ```
 ```
@@ -51,6 +55,10 @@ Each test is repeated the number of times given by `-r`, and the results are ave
 
 
 For a description of the other options, see the [main example](../main/README.md).
 For a description of the other options, see the [main example](../main/README.md).
 
 
+Note:
+
+- When using SYCL backend, there would be hang issue in some cases. Please set `--mmp 0`.
+
 ## Examples
 ## Examples
 
 
 ### Text generation with different models
 ### Text generation with different models

+ 1 - 1
examples/sycl/win-run-llama2.bat

@@ -2,7 +2,7 @@
 ::  Copyright (C) 2024 Intel Corporation
 ::  Copyright (C) 2024 Intel Corporation
 ::  SPDX-License-Identifier: MIT
 ::  SPDX-License-Identifier: MIT
 
 
-INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
+set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
 @call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
 @call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force