Aidan
|
eeee367de5
server: fix correct time_ms calculation in prompt_progress (#17093)
|
2 mēneši atpakaļ |
Aman Gupta
|
64fe17fbb8
Revert "CUDA: add expert reduce kernel (#16857)" (#17100)
|
2 mēneši atpakaļ |
Aman Gupta
|
c1b187688d
CUDA: skip fusion for repeating adds in bias (#17080)
|
2 mēneši atpakaļ |
SavicStefan
|
b8a5cfd11a
vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (#16636)
|
2 mēneši atpakaļ |
Aleksei Nikiforov
|
08416ebe7f
ggml: disable vxe for cross-compilation by default (#16966)
|
2 mēneši atpakaļ |
Jeff Bolz
|
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (#16977)
|
2 mēneši atpakaļ |
Jeff Bolz
|
d6fe40fa00
vulkan: Fix test-thread-safety crashes (#17024)
|
2 mēneši atpakaļ |
Johannes Gäßler
|
e14e842e87
CUDA: fix MMQ stream-k fixup ne1 indices (#17089)
|
2 mēneši atpakaļ |
Reese Levine
|
647b960bd8
ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031)
|
2 mēneši atpakaļ |
bssrdf
|
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy (#17081)
|
2 mēneši atpakaļ |
Acly
|
ac76d36201
vulkan : refactor buffer handling in vk_op_f32 (#16840)
|
2 mēneši atpakaļ |
Johannes Gäßler
|
6515610506
CUDA: fix should_use_mmvf for ne11 == 1 (#17085)
|
2 mēneši atpakaļ |
Georgi Gerganov
|
7956bb4d7f
bench : cache the llama_context state at computed depth (#16944)
|
2 mēneši atpakaļ |
Sigbjørn Skjæret
|
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
|
2 mēneši atpakaļ |
Georgi Gerganov
|
16bcc1259d
kv-cache : pad the cache size to 256 for performance (#17046)
|
2 mēneši atpakaļ |
Adrien Gallouët
|
9eb9a1331d
Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084)
|
2 mēneši atpakaļ |
iron
|
7c23f3f0d4
ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)
|
2 mēneši atpakaļ |
Georgi Gerganov
|
8c0d6bb455
server : print the samplers chain for each request (#17070)
|
2 mēneši atpakaļ |
Xuan-Son Nguyen
|
5c9a18e674
common: move download functions to download.(cpp|h) (#17059)
|
2 mēneši atpakaļ |
xctan
|
7f09a680af
ggml-cpu : optimize RVV q2_k and q3_k kernels (#16887)
|
2 mēneši atpakaļ |
Johannes Gäßler
|
aa374175c3
CUDA: fix crash on uneven context without FA (#16988)
|
2 mēneši atpakaļ |
Georgi Gerganov
|
5b180c3d60
metal : initial Metal4 tensor API support (#16634)
|
2 mēneši atpakaļ |
Georgi Gerganov
|
b7f9010d24
server : disable checkpoints with mtmd (#17045)
|
2 mēneši atpakaļ |
Xuan-Son Nguyen
|
4882f0ff78
clip: implement minicpm-v sinusoidal embd using GGML (#17036)
|
2 mēneši atpakaļ |
YehuditE
|
9d7c518d64
sycl: add CONCAT operator support (#16047)
|
2 mēneši atpakaļ |
Johannes Gäßler
|
22c8c3c6ad
docs: explain CUDA 11 compilation [no ci] (#16824)
|
2 mēneši atpakaļ |
l3utterfly
|
6db3d1ffe6
ggml-hexagon: graceful fallback for older socs where rpcmem_alloc2 and FASTRPC_GET_URI is unsupported (#16987)
|
2 mēneši atpakaļ |
bssrdf
|
230d1169e5
improve CUDA cpy memory bandwidth when copying transposed tensor (#16841)
|
2 mēneši atpakaļ |
Jeff Bolz
|
a44d77126c
vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (#16919)
|
2 mēneši atpakaļ |
Gabe Goodhart
|
5886f4f545
examples(gguf): GGUF example outputs (#17025)
|
2 mēneši atpakaļ |