Johannes Gäßler
|
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 (#19070)
|
5 days ago |
Johannes Gäßler
|
4e5b83b226
GGUF: check that tensor size is representable (#19072)
|
5 days ago |
Xuan-Son Nguyen
|
bb02f74c61
chat: fix language input for translategemma (#19052)
|
5 days ago |
Johannes Gäßler
|
8f91ca54ec
CUDA: re-use MLA K data for V in MMA FA (#19057)
|
5 days ago |
Aman Gupta
|
81ab64f3c8
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)
|
6 days ago |
nullname
|
8af1f5f430
ggml-hexagon: flash-attn opt (#19025)
|
6 days ago |
Georgi Gerganov
|
557515be1e
graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)
|
6 days ago |
Neo Zhang
|
cb6caca191
[SYCL] use malloc to support both iGPU and dGPU in same time (#18992)
|
6 days ago |
Xuan-Son Nguyen
|
b5b8fa1c8b
chat : fix translategemma crash on common_chat_format_example (#19019)
|
6 days ago |
Daniel Bevenius
|
a14b960bc7
model-conversion : use BUILD_DIR variable in all scripts (#19015)
|
6 days ago |
Alberto Cabrera Pérez
|
091a46cb8d
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860)
|
6 days ago |
Aldehir Rojas
|
a3e812811d
cli : load parser definition (#19031)
|
1 week ago |
Xuan-Son Nguyen
|
51fa458a92
server : support preserving reasoning_content in assistant message (#18994)
|
1 week ago |
Georgi Gerganov
|
a5eaa1d6a3
mla : make the V tensor a view of K (#18986)
|
1 week ago |
Johannes Gäßler
|
e2baf02162
CUDA: fix alignment check for FA (#19023)
|
1 week ago |
Aman Gupta
|
e34d6d03b2
convert_hf_to_gguf.py: refactor modify_tensors to call super (#18866)
|
1 week ago |
lhez
|
9c96465f99
opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970)
|
1 week ago |
Xuan-Son Nguyen
|
4e595b250a
server: do not log certain endpoints (avoid log spam) (#19028)
|
1 week ago |
Georgi Gerganov
|
0e4ebeb057
quant : manual overrides of tensor types take precedence (#18952)
|
1 week ago |
Aaron Teo
|
8b30840703
release: update github api (#19022)
|
1 week ago |
Xuan-Son Nguyen
|
9eb5bfec1a
mtmd : update docs to use llama_model_n_embd_inp (#18999)
|
1 week ago |
손희준
|
c6926d1d95
server: Reorder methods in `server-task.cpp` (#19016)
|
1 week ago |
Aman Gupta
|
b70d251076
CUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953)
|
1 week ago |
shaofeiqi
|
5516b9c16a
opencl: add TRI op support (#18979)
|
1 week ago |
Aleksei Nikiforov
|
94242a62c0
ggml-zdnn : mark zDNN buffers as non-host (#18967)
|
1 week ago |
Pádraic Slattery
|
6b99a223e3
ci : update GitHub Actions versions [no ci] (#18935)
|
1 week ago |
Mariusz Woloszyn
|
77078e80e5
convert : add Devstral-2 (Ministral3ForCausalLM) arch (#18972)
|
1 week ago |
Piotr Wilkin (ilintar)
|
c301172f66
jinja: support none|string (#18995)
|
1 week ago |
Hendrik Erz
|
3802d3c78f
fix: Use `tabular-nums` for chat message statistics (#18915)
|
1 week ago |
Daniel Bevenius
|
9da3dcd753
llama : clarify nemotron-h.cpp comment about RoPE [no ci] (#18997)
|
1 week ago |