|
@@ -61,7 +61,7 @@ range of hardware - locally and in the cloud.
|
|
|
- Plain C/C++ implementation without any dependencies
|
|
- Plain C/C++ implementation without any dependencies
|
|
|
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
|
|
- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
|
|
|
- AVX, AVX2, AVX512 and AMX support for x86 architectures
|
|
- AVX, AVX2, AVX512 and AMX support for x86 architectures
|
|
|
-- RVV, ZVFH, ZFH and ZICBOP support for RISC-V architectures
|
|
|
|
|
|
|
+- RVV, ZVFH, ZFH, ZICBOP and ZIHINTPAUSE support for RISC-V architectures
|
|
|
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
|
|
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
|
|
|
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA)
|
|
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads GPUs via MUSA)
|
|
|
- Vulkan and SYCL backend support
|
|
- Vulkan and SYCL backend support
|