cturan/llama.cpp

Author	SHA1 Message	Date
Johannes Gäßler	e14e842e87 CUDA: fix MMQ stream-k fixup ne1 indices (#17089)	2 months ago
Reese Levine	647b960bd8 ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031)	2 months ago
bssrdf	299f5d782c CUDA: properly handle nb00=nb02 case for cpy (#17081)	2 months ago
Acly	ac76d36201 vulkan : refactor buffer handling in vk_op_f32 (#16840)	2 months ago
Johannes Gäßler	6515610506 CUDA: fix should_use_mmvf for ne11 == 1 (#17085)	2 months ago
Georgi Gerganov	7956bb4d7f bench : cache the llama_context state at computed depth (#16944)	2 months ago
Sigbjørn Skjæret	9008027aa3 hparams : add n_embd_inp() to support extended embed (#16928)	2 months ago
Georgi Gerganov	16bcc1259d kv-cache : pad the cache size to 256 for performance (#17046)	2 months ago
Adrien Gallouët	9eb9a1331d Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084)	2 months ago
iron	7c23f3f0d4 ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)	2 months ago
Georgi Gerganov	8c0d6bb455 server : print the samplers chain for each request (#17070)	2 months ago
Xuan-Son Nguyen	5c9a18e674 common: move download functions to download.(cpp\|h) (#17059)	2 months ago
xctan	7f09a680af ggml-cpu : optimize RVV q2_k and q3_k kernels (#16887)	2 months ago
Johannes Gäßler	aa374175c3 CUDA: fix crash on uneven context without FA (#16988)	2 months ago
Georgi Gerganov	5b180c3d60 metal : initial Metal4 tensor API support (#16634)	2 months ago
Georgi Gerganov	b7f9010d24 server : disable checkpoints with mtmd (#17045)	2 months ago
Xuan-Son Nguyen	4882f0ff78 clip: implement minicpm-v sinusoidal embd using GGML (#17036)	2 months ago
YehuditE	9d7c518d64 sycl: add CONCAT operator support (#16047)	2 months ago
Johannes Gäßler	22c8c3c6ad docs: explain CUDA 11 compilation [no ci] (#16824)	2 months ago
l3utterfly	6db3d1ffe6 ggml-hexagon: graceful fallback for older socs where rpcmem_alloc2 and FASTRPC_GET_URI is unsupported (#16987)	2 months ago
bssrdf	230d1169e5 improve CUDA cpy memory bandwidth when copying transposed tensor (#16841)	2 months ago
Jeff Bolz	a44d77126c vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (#16919)	2 months ago
Gabe Goodhart	5886f4f545 examples(gguf): GGUF example outputs (#17025)	2 months ago
Xuan-Son Nguyen	92bb84f775 mtmd: allow QwenVL to process larger image by default (#17020)	2 months ago
Georgi Gerganov	13b339bcd9 server : do not default to multiple slots with speculative decoding (#17017)	2 months ago
Xuan-Son Nguyen	2f0c2db43e mtmd: improve struct initialization (#16981)	2 months ago
손희준	fd2f84f468 docs: Clarify the endpoint that webui uses (#17001)	2 months ago
Li Pengzhan	9f052478c2 model : add openPangu-Embedded (#16941)	2 months ago
Reese Levine	03ea04175d ggml webgpu: minor set rows optimization (#16810)	2 months ago
Georgi Gerganov	cdabeb2c27 sync : ggml	2 months ago

Newer Older

Commit History Find

Commit History