Jeff Bolz
|
4f63cd705c
vulkan: Fix OOB accesses in soft_max_back (#15861)
|
4 ماه پیش |
Johannes Gäßler
|
17bc5a815f
HIP: use v_dot2_f32_f16 instruction for FA (#15884)
|
4 ماه پیش |
lksj92hs
|
ed54e32558
Workaround for subgroup arithmetic failing on MoltenVK with AMD GPUs (issue 15846) (#15886)
|
4 ماه پیش |
Aman Gupta
|
a972faebed
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
4 ماه پیش |
Johannes Gäßler
|
550cf726e1
CUDA: fix GET_ROWS for large tensors (#15882)
|
4 ماه پیش |
Georgi Gerganov
|
c252ce67c4
contrib : add notes about merging PRs (#15881)
|
4 ماه پیش |
Daniel Bevenius
|
70cd37dbbe
requirements : update transformers/torch for Embedding Gemma (#15828)
|
4 ماه پیش |
Piotr Wilkin (ilintar)
|
acc1b008cf
model-conversion : add extra debugging support for model conversion (#15877)
|
4 ماه پیش |
Aldehir Rojas
|
7057faf64b
json : support `enum` values within `allOf` (#15830)
|
4 ماه پیش |
j-k
|
fe1c92cd7b
media : add llama1 icon (#15878)
|
4 ماه پیش |
Jeff Bolz
|
e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
|
4 ماه پیش |
Aman Gupta
|
0a16bf52e6
CUDA: generate_cu_files.py - add missing mxfp4 (#15880)
|
4 ماه پیش |
Jesse
|
88021565f0
chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533)
|
4 ماه پیش |
Xuan-Son Nguyen
|
56920f5665
server : bring back timings_per_token (#15879)
|
4 ماه پیش |
Georgi Gerganov
|
b0d52998b9
cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868)
|
4 ماه پیش |
Georgi Gerganov
|
f28d4f4ac9
metal : refactor + optimize (#15857)
|
4 ماه پیش |
Xuan-Son Nguyen
|
9fcb29f22f
ggml: allow casting between f32 and i32 (#15783)
|
4 ماه پیش |
Sigbjørn Skjæret
|
5ef22d281d
CUDA: non-contiguous src0 not supported for PAD (#15869)
|
4 ماه پیش |
Daniel Bevenius
|
233d773d02
convert : force setting sliding_window from original config (#15867)
|
4 ماه پیش |
Georgi Gerganov
|
a885dcff11
batched-bench : fix llama_synchronize usage during prompt processing (#15835)
|
4 ماه پیش |
Georgi Gerganov
|
663027fd54
context : fix n_outputs during reserve (#15858)
|
4 ماه پیش |
Georgi Gerganov
|
cf0e3ba150
model : avoid ggml_cont_3d for fused QKV weights (#15662)
|
4 ماه پیش |
Jeff Bolz
|
d413dca003
tests: large sizes for get_rows (#15687)
|
4 ماه پیش |
Chenguang Li
|
85ca66a746
CANN: Stream sync between devices for acl_graph (#15809)
|
4 ماه پیش |
Jeff Bolz
|
3976dfbe00
vulkan: support im2col_3d (#15795)
|
4 ماه پیش |
Aaron Teo
|
d36e61c580
ggml-cpu: clean up s390x SIMD (#15855)
|
4 ماه پیش |
Jeff Bolz
|
c97b5e5854
vulkan: Support pad_ext (#15794)
|
4 ماه پیش |
Jeff Bolz
|
267e99867f
vulkan: Use larger loads in scalar/coopmat1 matmul (#15729)
|
4 ماه پیش |
Daniel Bevenius
|
3b15924d71
ggml WebGPU: remove userdata from request adapter callback (#15527)
|
4 ماه پیش |
Johannes Gäßler
|
79bc429262
CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769)
|
4 ماه پیش |