safranowith
|
2330de7b84
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613)
|
2 meses atrás |
Ilia Ilmer
|
9ad4f1931e
metal : add `CONV_TRANSPOSE_2D` (#16542)
|
3 meses atrás |
lhez
|
0cb7a0683b
opencl: add q8_0 mm support (#16469)
|
3 meses atrás |
Sam/Samuel
|
f4ce81c45e
metal: optimise `GGML_OP_SUM` (#16559)
|
3 meses atrás |
Aman Gupta
|
48e2fa9fb7
CUDA: add fp kernel for larger batch size MoE (#16512)
|
3 meses atrás |
Georgi Gerganov
|
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
|
3 meses atrás |
Georgi Gerganov
|
0a319bb75e
metal : add support for non-padded FA KV (#16148)
|
3 meses atrás |
Georgi Gerganov
|
1d6092fc72
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
|
3 meses atrás |
Reese Levine
|
ef07a40906
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
|
3 meses atrás |
Reese Levine
|
8d78cd2613
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
|
3 meses atrás |
Jeff Bolz
|
a74a0d69f3
tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences (#16295)
|
3 meses atrás |
Sigbjørn Skjæret
|
adc76347d7
ggml : check cuda and metal argsort limits and add test (#16323)
|
3 meses atrás |
Sigbjørn Skjæret
|
b887d2f341
ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307)
|
3 meses atrás |
Jeff Bolz
|
d8359f5fde
vulkan: 64-bit im2col (#16135)
|
3 meses atrás |
Georgi Gerganov
|
6a2c6145a0
metal : extend mat-mat multiplication support (#16225)
|
3 meses atrás |
Jeff Bolz
|
1384abf8b8
vulkan: handle mat_mul with A matrix > 4GB (#16176)
|
3 meses atrás |
Aman Gupta
|
c0bfc57af4
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277)
|
3 meses atrás |
Aman Gupta
|
077c94d0ca
CUDA: add a fused top-K MoE kernel (#16130)
|
3 meses atrás |
Georgi Gerganov
|
dfcd53f7ec
metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220)
|
3 meses atrás |
Sigbjørn Skjæret
|
3ecb2f671a
ggml : implement set_rows with i32 index (#16159)
|
3 meses atrás |
Shin-myoung-serp
|
96fdca043b
Vulkan: add conv_transpose_2d operation (#16022)
|
3 meses atrás |
Ruben Ortlam
|
9073a73d82
vulkan: vec dot matrix multiplication fix (#16151)
|
3 meses atrás |
Xuan-Son Nguyen
|
0dd58b6877
ggml : refactor forward_dup for cpu backend (#16062)
|
4 meses atrás |
Bowen Han
|
38dbdf4c05
CUDA: Optimize PAD_REFLECT_1D (#15957)
|
4 meses atrás |
Reese Levine
|
d304f459d8
GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018)
|
4 meses atrás |
Georgi Gerganov
|
0320ac5264
metal : refactor + optimize v2 (#15995)
|
4 meses atrás |
Oliver Simons
|
00681dfc16
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872)
|
4 meses atrás |
Daniel Bevenius
|
e7b6d83b52
tests : filter out no-ops from coverage report (#15900)
|
4 meses atrás |
Jeff Bolz
|
4f63cd705c
vulkan: Fix OOB accesses in soft_max_back (#15861)
|
4 meses atrás |
Aman Gupta
|
a972faebed
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
4 meses atrás |