stduhpf
|
06d70147e6
Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723)
|
vor 1 Jahr |
Diego Devesa
|
43ed389a3f
llama : use cmake for swift build (#10525)
|
vor 1 Jahr |
Jeff Bolz
|
ecc93d0558
vulkan: compile a test shader in cmake to check for coopmat2 support (#10713)
|
vor 1 Jahr |
Robert Collins
|
62e84d9848
llama : add 128k yarn context for Qwen (#10698)
|
vor 1 Jahr |
Xuan Son Nguyen
|
3573fa8e7b
server : (refactor) no more json in server_task input (#10691)
|
vor 1 Jahr |
Georgi Gerganov
|
d9c3ba2b77
ggml : disable iq4_nl interleave size 8 (#10709)
|
vor 1 Jahr |
Georgi Gerganov
|
ce4a7b8493
server : various fixes (#10704)
|
vor 1 Jahr |
Djip007
|
19d8762ab6
ggml : refactor online repacking (#10446)
|
vor 1 Jahr |
Georgi Gerganov
|
c2a16c0bdb
server : fix free of spec context and batch (#10651)
|
vor 1 Jahr |
0cc4m
|
3df784b305
Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597)
|
vor 1 Jahr |
Robert Ormandi
|
86a1934978
metal : Extend how Llama.cpp locates metal resources (#10676)
|
vor 1 Jahr |
Sukriti Sharma
|
784a14aa49
convert : add support for Roberta embeddings (#10695)
|
vor 1 Jahr |
Georgi Gerganov
|
c5ede3849f
convert : add custom attention mapping
|
vor 1 Jahr |
Xuan Son Nguyen
|
f162d45a21
common : bring back --no-warmup to server (#10686)
|
vor 1 Jahr |
Xuan Son Nguyen
|
6c5bc0625f
server : (refactoring) do not rely on JSON internally (#10643)
|
vor 1 Jahr |
Plamen Minev
|
7736837d62
fix(server) : not show alert when DONE is received (#10674)
|
vor 1 Jahr |
Jeff Bolz
|
c9c6e01dae
vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206)
|
vor 1 Jahr |
Riccardo Orlando
|
6fe6247831
llama : add Minerva 7B model support (#10673)
|
vor 1 Jahr |
Georgi Gerganov
|
0cd182ebcc
sync : ggml
|
vor 1 Jahr |
PAB
|
a8cbab201d
ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037)
|
vor 1 Jahr |
PAB
|
c2082d93a8
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)
|
vor 1 Jahr |
Daniel Bevenius
|
d405804be8
py : update outdated copy-paste instructions [no ci] (#10667)
|
vor 1 Jahr |
aryantandon01
|
f112d198cd
Update deprecation-warning.cpp (#10619)
|
vor 1 Jahr |
Georgi Gerganov
|
1da7b76569
server : fix speculative decoding with context shift (#10641)
|
vor 1 Jahr |
Diego Devesa
|
59f4db1088
ggml : add predefined list of CPU backend variants to build (#10626)
|
vor 1 Jahr |
Diego Devesa
|
2803540814
ggml-cpu : fix HWCAP2_I8MM value (#10646)
|
vor 1 Jahr |
ltoniazzi
|
253b7fde91
Fix HF repo commit to clone lora test models (#10649)
|
vor 1 Jahr |
JFLFY2255
|
8d0cfd554a
llama: Support MiniCPM-1B (with & w/o longrope) (#10559)
|
vor 1 Jahr |
Jeff Bolz
|
2759916d86
vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642)
|
vor 1 Jahr |
Nicolò Scipione
|
40c6d79fb5
SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584)
|
vor 1 Jahr |