Sigbjørn Skjæret
|
b2d980fce0
codeowners : claim responsibility for ci, models, gguf-py and convert (#16124)
|
4 months ago |
Georgi Gerganov
|
5c6106a696
contrib : update roles (#16113)
|
4 months ago |
Georgi Gerganov
|
ec65fb52f0
ci : remove vulkaninfo calls (#16169)
|
4 months ago |
Georgi Gerganov
|
1d660d2fae
ci : use smaller model (#16168)
|
4 months ago |
Jeff Bolz
|
a20d810d79
vulkan: add RTE variants of exp shader (#16165)
|
4 months ago |
Georgi Gerganov
|
4d0a7cbc61
ci : adjust params for less runtime (#16167)
|
4 months ago |
Ruben Ortlam
|
9073a73d82
vulkan: vec dot matrix multiplication fix (#16151)
|
4 months ago |
lhez
|
51f5a45fbe
opencl: fix concat crash on win arm64 with Adreno (#15944)
|
4 months ago |
lhez
|
c4510dc937
opencl: initial `q8_0` mv support (#15732)
|
4 months ago |
Georgi Gerganov
|
da30ab5f86
ci : add label for the RISC-V runner (#16150)
|
4 months ago |
Georgi Gerganov
|
28baac9c9f
ci : migrate ggml ci to self-hosted runners (#16116)
|
4 months ago |
Giuseppe Scrivano
|
1eeb523c3e
vulkan: optimize UMA buffer operations and fix driver hangs (#16059)
|
4 months ago |
Jeff Bolz
|
5bb4a3edec
vulkan: fix validation error about VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR (#16086)
|
4 months ago |
Georgi Gerganov
|
7f766929ca
sync : ggml
|
4 months ago |
Daniel Bevenius
|
405921dcef
ggml : introduce semantic versioning (ggml/1336)
|
4 months ago |
Gregor Jasny
|
fa6383ca7e
CUDA : conditionally add cuda architectures (ggml/1341)
|
4 months ago |
Ruben Ortlam
|
803dac2e48
vulkan: use vec dot for matrix matrix multiplications (#16056)
|
4 months ago |
Benni
|
459c0c2c1a
server: fix SSE and OpenAI compatibility for error messages when streaming (#16109)
|
4 months ago |
ssweens
|
be79d9fdd9
llama-bench: add --devices and --list-devices support (#16039)
|
4 months ago |
shun095
|
f432d8d83e
chat: Fix streaming parser for granite models (#15682)
|
4 months ago |
Aleksander Grygier
|
4067f07fc5
feat: Improve mobile UI for Settings Dialog (#16084)
|
4 months ago |
Xuan-Son Nguyen
|
4b8560ab56
chat : fix build on arm64 (#16101)
|
4 months ago |
Xuan-Son Nguyen
|
0dd58b6877
ggml : refactor forward_dup for cpu backend (#16062)
|
4 months ago |
Adrien Gallouët
|
69ffd89163
ggml-amx : fix ggml_amx_init() on generic Linux (#16049)
|
4 months ago |
Adrien Gallouët
|
246c0d9c79
cmake : fix static linking for OpenMP on Unix-like systems (#16031)
|
4 months ago |
Shawn Gu
|
3edd87cd05
opencl: optimize mxfp4 kernels (#16037)
|
4 months ago |
Jeff Bolz
|
c0b45097c3
rename optimize_graph to graph_optimize (#16082)
|
4 months ago |
Bowen Han
|
38dbdf4c05
CUDA: Optimize PAD_REFLECT_1D (#15957)
|
4 months ago |
Johannes Gäßler
|
368560a1e3
CUDA: fix compilation on CC 6.0 (#16091)
|
4 months ago |
Eric Curtin
|
4ca088b036
Add resumable downloads for llama-server model loading (#15963)
|
4 months ago |