Jeff Bolz
|
d413dca003
tests: large sizes for get_rows (#15687)
|
4 mesi fa |
Chenguang Li
|
85ca66a746
CANN: Stream sync between devices for acl_graph (#15809)
|
4 mesi fa |
Jeff Bolz
|
3976dfbe00
vulkan: support im2col_3d (#15795)
|
4 mesi fa |
Aaron Teo
|
d36e61c580
ggml-cpu: clean up s390x SIMD (#15855)
|
4 mesi fa |
Jeff Bolz
|
c97b5e5854
vulkan: Support pad_ext (#15794)
|
4 mesi fa |
Jeff Bolz
|
267e99867f
vulkan: Use larger loads in scalar/coopmat1 matmul (#15729)
|
4 mesi fa |
Daniel Bevenius
|
3b15924d71
ggml WebGPU: remove userdata from request adapter callback (#15527)
|
4 mesi fa |
Johannes Gäßler
|
79bc429262
CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769)
|
4 mesi fa |
Charles Xu
|
c4df49a42d
kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16 (#15817)
|
4 mesi fa |
Xuan-Son Nguyen
|
3c3635d2f2
server : speed up tests (#15836)
|
4 mesi fa |
Xuan-Son Nguyen
|
61bdfd5298
server : implement prompt processing progress report in stream mode (#15827)
|
4 mesi fa |
Johannes Gäßler
|
01806e7771
ggml-cpu: document use of "free" memory [no ci] (#15834)
|
4 mesi fa |
Aaron Teo
|
186415d595
ggml-cpu: drop support for nnpa intrinsics (#15821)
|
4 mesi fa |
Gabe Goodhart
|
fd621880f3
aLoRA Support (#15327)
|
4 mesi fa |
Sigbjørn Skjæret
|
4281c7b315
ci : exempt correct research label (#15825)
|
4 mesi fa |
Gabe Goodhart
|
5fac79cbc7
Thinking model disabled assistant prefill (#15404)
|
4 mesi fa |
Eric Curtin
|
408ff524b4
Implement --log-colors with always/never/auto (#15792)
|
4 mesi fa |
Johannes Gäßler
|
5143fa895e
CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (#15802)
|
4 mesi fa |
Daniel Bevenius
|
3a550b5ca4
tests : add --list-ops and --show-coverage options (#15745)
|
4 mesi fa |
Erik Scholz
|
a81283820a
gguf: gguf_writer refactor (#15691)
|
4 mesi fa |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
4 mesi fa |
Daniel Bevenius
|
5d6688de08
model-conversion : add --embeddings flag to modelcard.template [no ci] (#15801)
|
4 mesi fa |
ExtReMLapin
|
4fd1242bef
chat : fixed crash when Hermes 2 <tool_call> had a newline before it (#15639)
|
4 mesi fa |
Piotr Wilkin (ilintar)
|
b2426e469e
chat : nemotron thinking & toolcalling support (#15676)
|
4 mesi fa |
Piotr Wilkin (ilintar)
|
9e2b1e83c6
scripts : add Jinja tester PySide6 simple app (#15756)
|
4 mesi fa |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
4 mesi fa |
Gabe Goodhart
|
856ed0947f
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (#15799)
|
4 mesi fa |
Daniel Bevenius
|
d1e2adba65
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (#15791)
|
4 mesi fa |
Chenguang Li
|
c1c354e44c
CANN: Refactor ND to NZ workspace to be per-device (#15763)
|
4 mesi fa |
Xuan-Son Nguyen
|
a68d914426
server: add exceed_context_size_error type (#15780)
|
4 mesi fa |