1 vuosi sitten · a21c6fd450
--- a/docs/backend/SYCL.md
+++ b/docs/backend/SYCL.md
@@ -80,7 +80,14 @@ The following release is verified with good quality:
 
															 ### Intel GPU
														
 
															-**Verified devices**
														
 
															+SYCL backend supports Intel GPU Family:
														
 
															+
														
 
															+- Intel Data Center Max Series
														
 
															+- Intel Flex Series, Arc Series
														
 
															+- Intel Built-in Arc GPU
														
 
															+- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
														
 
															+
														
 
															+#### Verified devices
														
 
															 | Intel GPU                     | Status  | Verified Model                        |
														
 
															 |-------------------------------|---------|---------------------------------------|
														
@@ -88,7 +95,7 @@ The following release is verified with good quality:
 
															 | Intel Data Center Flex Series | Support | Flex 170                              |
														
 
															 | Intel Arc Series              | Support | Arc 770, 730M, Arc A750               |
														
 
															 | Intel built-in Arc GPU        | Support | built-in Arc GPU in Meteor Lake       |
														
 
															-| Intel iGPU                    | Support | iGPU in i5-1250P, i7-1260P, i7-1165G7 |
														
 
															+| Intel iGPU                    | Support | iGPU in 13700k, i5-1250P, i7-1260P, i7-1165G7 |
														
 
															 *Notes:*
														
@@ -237,6 +244,13 @@ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA devic
 
															 ### II. Build llama.cpp
														
 
															 #### Intel GPU
														
 
															+
														
 
															+```
														
 
															+./examples/sycl/build.sh
														
 
															+```
														
 
															+
														
 
															+or
														
 
															+
														
 
															 ```sh
														
 
															 # Export relevant ENV variables
														
 
															 source /opt/intel/oneapi/setvars.sh
														
@@ -276,23 +290,26 @@ cmake --build build --config Release -j -v
 
															 ### III. Run the inference
														
 
															-1. Retrieve and prepare model
														
 
															+#### Retrieve and prepare model
														
 
															 You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
														
 
															-2. Enable oneAPI running environment
														
 
															+##### Check device
														
 
															+
														
 
															+1. Enable oneAPI running environment
														
 
															 ```sh
														
 
															 source /opt/intel/oneapi/setvars.sh
														
 
															 ```
														
 
															-3. List devices information
														
 
															+2. List devices information
														
 
															 Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
														
 
															 ```sh
														
 
															 ./build/bin/llama-ls-sycl-device
														
 
															 ```
														
 
															+
														
 
															 This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
														
 
															 ```
														
 
															 found 2 SYCL devices:
														
@@ -304,12 +321,37 @@ found 2 SYCL devices:
 
															 | 1|[level_zero:gpu:1]|                    Intel(R) UHD Graphics 770|       1.3|         32|     512|     32|    53651849216|
														
 
															 ```
														
 
															+#### Choose level-zero devices
														
 
															+
														
 
															+|Chosen Device ID|Setting|
														
 
															+|-|-|
														
 
															+|0|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
														
 
															+|1|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
														
 
															+|0 & 1|`export ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
														
 
															+
														
 
															+#### Execute
														
 
															+
														
 
															+Choose one of following methods to run.
														
 
															+
														
 
															+1. Script
														
 
															+
														
 
															+- Use device 0:
														
 
															+
														
 
															+```sh
														
 
															+./examples/sycl/run_llama2.sh 0
														
 
															+```
														
 
															+- Use multiple devices:
														
 
															+
														
 
															+```sh
														
 
															+./examples/sycl/run_llama2.sh
														
 
															+```
														
 
															-4. Launch inference
														
 
															+2. Command line
														
 
															+Launch inference
														
 
															 There are two device selection modes:
														
 
															-- Single device: Use one device target specified by the user.
														
 
															+- Single device: Use one device assigned by user. Default device id is 0.
														
 
															 - Multiple devices: Automatically choose the devices with the same backend.
														
 
															 In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR.
														
@@ -326,11 +368,6 @@ Examples:
 
															 ```sh
														
 
															 ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
														
 
															 ```
														
 
															-or run by script:
														
 
															-
														
 
															-```sh
														
 
															-./examples/sycl/run_llama2.sh 0
														
 
															-```
														
 
															 - Use multiple devices:
														
@@ -338,12 +375,6 @@ or run by script:
 
															 ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
														
 
															 ```
														
 
															-Otherwise, you can run the script:
														
 
															-
														
 
															-```sh
														
 
															-./examples/sycl/run_llama2.sh
														
 
															-```
														
 
															-
														
 
															 *Notes:*
														
 
															 - Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
														
@@ -390,7 +421,7 @@ c. Verify installation
 
															 In the oneAPI command line, run the following to print the available SYCL devices:
														
 
															 ```
														
 
															-sycl-ls
														
 
															+sycl-ls.exe
														
 
															 ```
														
 
															 There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
														
@@ -411,6 +442,18 @@ b. The new Visual Studio will install Ninja as default. (If not, please install
 
															 ### II. Build llama.cpp
														
 
															+You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
														
 
															+
														
 
															+Choose one of following methods to build from source code.
														
 
															+
														
 
															+1. Script
														
 
															+
														
 
															+```sh
														
 
															+.\examples\sycl\win-build-sycl.bat
														
 
															+```
														
 
															+
														
 
															+2. CMake
														
 
															+
														
 
															 On the oneAPI command line window, step into the llama.cpp main directory and run the following:
														
 
															 ```
														
@@ -425,12 +468,8 @@ cmake -B build -G "Ninja" -DGGML_SYCL=ON -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPI
 
															 cmake --build build --config Release -j
														
 
															 ```
														
 
															-Otherwise, run the `win-build-sycl.bat` wrapper which encapsulates the former instructions:
														
 
															-```sh
														
 
															-.\examples\sycl\win-build-sycl.bat
														
 
															-```
														
 
															-
														
 
															 Or, use CMake presets to build:
														
 
															+
														
 
															 ```sh
														
 
															 cmake --preset x64-windows-sycl-release
														
 
															 cmake --build build-x64-windows-sycl-release -j --target llama-cli
														
@@ -442,7 +481,9 @@ cmake --preset x64-windows-sycl-debug
 
															 cmake --build build-x64-windows-sycl-debug -j --target llama-cli
														
 
															 ```
														
 
															-Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
														
 
															+3. Visual Studio
														
 
															+
														
 
															+You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
														
 
															 *Notes:*
														
@@ -450,23 +491,25 @@ Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choos
 
															 ### III. Run the inference
														
 
															-1. Retrieve and prepare model
														
 
															+#### Retrieve and prepare model
														
 
															-You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
														
 
															+You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
														
 
															-2. Enable oneAPI running environment
														
 
															+##### Check device
														
 
															+
														
 
															+1. Enable oneAPI running environment
														
 
															 On the oneAPI command line window, run the following and step into the llama.cpp directory:
														
 
															 ```
														
 
															 "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
														
 
															 ```
														
 
															-3. List devices information
														
 
															+2. List devices information
														
 
															 Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
														
 
															 ```
														
 
															-build\bin\ls-sycl-device.exe
														
 
															+build\bin\llama-ls-sycl-device.exe
														
 
															 ```
														
 
															 This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
														
@@ -478,10 +521,28 @@ found 2 SYCL devices:
 
															 | 0|[level_zero:gpu:0]|               Intel(R) Arc(TM) A770 Graphics|       1.3|        512|    1024|     32|    16225243136|
														
 
															 | 1|[level_zero:gpu:1]|                    Intel(R) UHD Graphics 770|       1.3|         32|     512|     32|    53651849216|
														
 
															+```
														
 
															+#### Choose level-zero devices
														
 
															+
														
 
															+|Chosen Device ID|Setting|
														
 
															+|-|-|
														
 
															+|0|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
														
 
															+|1|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
														
 
															+|0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
														
 
															+
														
 
															+#### Execute
														
 
															+
														
 
															+Choose one of following methods to run.
														
 
															+
														
 
															+1. Script
														
 
															+
														
 
															+```
														
 
															+examples\sycl\win-run-llama2.bat
														
 
															 ```
														
 
															+2. Command line
														
 
															-4. Launch inference
														
 
															+Launch inference
														
 
															 There are two device selection modes:
														
@@ -508,11 +569,7 @@ build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website ca
 
															 ```
														
 
															 build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm layer
														
 
															 ```
														
 
															-Otherwise, run the following wrapper script:
														
 
															-```
														
 
															-.\examples\sycl\win-run-llama2.bat
														
 
															-```
														
 
															 Note:
														
@@ -526,17 +583,18 @@ Or
 
															 use 1 SYCL GPUs: [0] with Max compute units:512
														
 
															 ```
														
 
															+
														
 
															 ## Environment Variable
														
 
															 #### Build
														
 
															 | Name               | Value                             | Function                                    |
														
 
															 |--------------------|-----------------------------------|---------------------------------------------|
														
 
															-| GGML_SYCL          | ON (mandatory)                    | Enable build with SYCL code path.           |
														
 
															+| GGML_SYCL          | ON (mandatory)                    | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
														
 
															 | GGML_SYCL_TARGET   | INTEL *(default)* \| NVIDIA       | Set the SYCL target device type.            |
														
 
															 | GGML_SYCL_F16      | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path.      |
														
 
															-| CMAKE_C_COMPILER   | icx                               | Set *icx* compiler for SYCL code path.      |
														
 
															-| CMAKE_CXX_COMPILER | icpx *(Linux)*, icx *(Windows)*   | Set `icpx/icx` compiler for SYCL code path. |
														
 
															+| CMAKE_C_COMPILER   | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path.      |
														
 
															+| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)*   | Set `icpx/icx` compiler for SYCL code path. |
														
 
															 #### Runtime
														
@@ -572,9 +630,18 @@ use 1 SYCL GPUs: [0] with Max compute units:512
 
															   ```
														
 
															   Otherwise, please double-check the GPU driver installation steps.
														
 
															+- Can I report Ollama issue on Intel GPU to llama.cpp SYCL backend?
														
 
															+
														
 
															+  No. We can't support Ollama issue directly, because we aren't familiar with Ollama.
														
 
															+
														
 
															+  Sugguest reproducing on llama.cpp and report similar issue to llama.cpp. We will surpport it.
														
 
															+
														
 
															+  It's same for other projects including llama.cpp SYCL backend.
														
 
															+
														
 
															+
														
 
															 ### **GitHub contribution**:
														
 
															 Please add the **[SYCL]** prefix/tag in issues/PRs titles to help the SYCL-team check/address them without delay.
														
 
															 ## TODO
														
 
															-- Support row layer split for multiple card runs.
														
 
															+- NA