|
|
před 1 rokem | |
|---|---|---|
| .devops | před 2 roky | |
| .github | před 1 rokem | |
| awq-py | před 2 roky | |
| ci | před 2 roky | |
| cmake | před 2 roky | |
| common | před 1 rokem | |
| docs | před 2 roky | |
| examples | před 1 rokem | |
| gguf-py | před 2 roky | |
| grammars | před 2 roky | |
| kompute @ 4565194ed7 | před 1 rokem | |
| kompute-shaders | před 1 rokem | |
| media | před 2 roky | |
| models | před 2 roky | |
| pocs | před 2 roky | |
| prompts | před 2 roky | |
| requirements | před 2 roky | |
| scripts | před 1 rokem | |
| spm-headers | před 2 roky | |
| tests | před 1 rokem | |
| .clang-tidy | před 2 roky | |
| .dockerignore | před 2 roky | |
| .ecrc | před 1 rokem | |
| .editorconfig | před 2 roky | |
| .flake8 | před 2 roky | |
| .gitignore | před 1 rokem | |
| .gitmodules | před 1 rokem | |
| .pre-commit-config.yaml | před 2 roky | |
| CMakeLists.txt | před 1 rokem | |
| LICENSE | před 2 roky | |
| Makefile | před 2 roky | |
| Package.swift | před 2 roky | |
| README-sycl.md | před 1 rokem | |
| README.md | před 1 rokem | |
| SHA256SUMS | před 2 roky | |
| build.zig | před 2 roky | |
| codecov.yml | před 2 roky | |
| convert-hf-to-gguf.py | před 2 roky | |
| convert-llama-ggml-to-gguf.py | před 2 roky | |
| convert-lora-to-ggml.py | před 2 roky | |
| convert-persimmon-to-gguf.py | před 2 roky | |
| convert.py | před 1 rokem | |
| flake.lock | před 2 roky | |
| flake.nix | před 2 roky | |
| ggml-alloc.c | před 1 rokem | |
| ggml-alloc.h | před 2 roky | |
| ggml-backend-impl.h | před 2 roky | |
| ggml-backend.c | před 1 rokem | |
| ggml-backend.h | před 2 roky | |
| ggml-cuda.cu | před 1 rokem | |
| ggml-cuda.h | před 2 roky | |
| ggml-impl.h | před 2 roky | |
| ggml-kompute.cpp | před 1 rokem | |
| ggml-kompute.h | před 1 rokem | |
| ggml-metal.h | před 1 rokem | |
| ggml-metal.m | před 1 rokem | |
| ggml-metal.metal | před 1 rokem | |
| ggml-mpi.c | před 2 roky | |
| ggml-mpi.h | před 2 roky | |
| ggml-opencl.cpp | před 1 rokem | |
| ggml-opencl.h | před 2 roky | |
| ggml-quants.c | před 1 rokem | |
| ggml-quants.h | před 1 rokem | |
| ggml-sycl.cpp | před 2 roky | |
| ggml-sycl.h | před 2 roky | |
| ggml-vulkan-shaders.hpp | před 2 roky | |
| ggml-vulkan.cpp | před 1 rokem | |
| ggml-vulkan.h | před 2 roky | |
| ggml.c | před 1 rokem | |
| ggml.h | před 1 rokem | |
| ggml_vk_generate_shaders.py | před 2 roky | |
| llama.cpp | před 1 rokem | |
| llama.h | před 1 rokem | |
| mypy.ini | před 2 roky | |
| requirements.txt | před 2 roky | |
| unicode.h | před 2 roky |
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators—such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17.
oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms.
Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs.
To avoid to re-invent the wheel, this code refer other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool SYCLomatic (Commercial release Intel® DPC++ Compatibility Tool) migrate to SYCL.
The llama.cpp for SYCL is used to support Intel GPUs.
For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|OS|Status|Verified| |-|-|-| |Linux|Support|Ubuntu 22.04| |Windows|Support|Windows 11|
|Intel GPU| Status | Verified Model| |-|-|-| |Intel Data Center Max Series| Support| Max 1550| |Intel Data Center Flex Series| Support| Flex 170| |Intel Arc Series| Support| Arc 770, 730M| |Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake| |Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
a. Please install Intel GPU driver by official guide: Install GPU Drivers.
Note: for iGPU, please install the client GPU driver.
b. Add user to group: video, render.
sudo usermod -aG render username
sudo usermod -aG video username
Note: re-login to enable it.
c. Check
sudo apt install clinfo
sudo clinfo -l
Output (example):
Platform #0: Intel(R) OpenCL Graphics
`-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49]
a. Please follow the procedure in Get the Intel® oneAPI Base Toolkit .
Recommend to install to default folder: /opt/intel/oneapi.
Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder.
b. Check
source /opt/intel/oneapi/setvars.sh
sycl-ls
There should be one or more level-zero devices. Like [ext_oneapi_level_zero:gpu:0].
Output (example):
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
Build locally:
mkdir -p build
cd build
source /opt/intel/oneapi/setvars.sh
#for FP16
#cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL_F16=ON # faster for long-prompt inference
#for FP32
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
#build example/main only
#cmake --build . --config Release --target main
#build all binary
cmake --build . --config Release -v
cd ..
or
./examples/sycl/build.sh
Note:
Put model file to folder models
Enable oneAPI running environment
source /opt/intel/oneapi/setvars.sh
List device ID
Run without parameter:
./build/bin/ls-sycl-device
or
./build/bin/main
Check the ID in startup log, like:
found 4 SYCL devices:
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
|Attribute|Note| |-|-| |compute capability 1.3|Level-zero running time, recommended | |compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
Set device ID = 0 by GGML_SYCL_DEVICE=0
GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
or run by script:
./examples/sycl/run-llama2.sh
Note:
Like:
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
Please install Intel GPU driver by official guide: Install GPU Drivers.
a. Please follow the procedure in Get the Intel® oneAPI Base Toolkit .
Recommend to install to default folder: /opt/intel/oneapi.
Following guide uses the default folder as example. If you use other folder, please modify the following guide info with your folder.
b. Enable oneAPI running environment:
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
In CMD:
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
c. Check GPU
In oneAPI command line:
sycl-ls
There should be one or more level-zero devices. Like [ext_oneapi_level_zero:gpu:0].
Output (example):
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [31.0.101.5186]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
a. Download & install cmake for windows: https://cmake.org/download/
b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/
In oneAPI command line window:
mkdir -p build
cd build
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
:: for FP16
:: faster for long-prompt inference
:: cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON
:: for FP32
cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release
:: build example/main only
:: make main
:: build all binary
make -j
cd ..
or
.\examples\sycl\win-build-sycl.bat
Note:
Put model file to folder models
Enable oneAPI running environment
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
In CMD:
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
Run without parameter:
build\bin\ls-sycl-device.exe
or
build\bin\main.exe
Check the ID in startup log, like:
found 4 SYCL devices:
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
|Attribute|Note| |-|-| |compute capability 1.3|Level-zero running time, recommended | |compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
Set device ID = 0 by set GGML_SYCL_DEVICE=0
set GGML_SYCL_DEVICE=0
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0
or run by script:
.\examples\sycl\win-run-llama2.bat
Note:
Like:
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|Name|Value|Function|
|-|-|-|
|LLAMA_SYCL|ON (mandatory)|Enable build with SYCL code path.
For FP32/FP16, LLAMA_SYCL=ON is mandatory.|
|LLAMA_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path. Faster for long-prompt inference.
For FP32, not set it.|
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
|CMAKE_CXX_COMPILER|icpx (Linux), icx (Windows)|use icpx/icx for SYCL code path|
|Name|Value|Function| |-|-|-| |GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output| |GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
Solution: add --no-mmap.
error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory.Miss to enable oneAPI running environment.
Install oneAPI base toolkit and enable it by: source /opt/intel/oneapi/setvars.sh.
Miss to enable oneAPI running environment.
Support to build in Windows.
Support multiple cards.