Daniel Bevenius
|
8c3fdf44ec
model-conversion : add missing curl script [no ci] (#15761)
|
4 months ago |
hipudding
|
f6da8cb86a
CANN: Mask unsupported TRANSPOSE_1D operator (#15733)
|
4 months ago |
Chenguang Li
|
8a2234ea0c
CANN: Fix type float_t to float (#15736)
|
4 months ago |
SnA1lGo
|
3de008208b
fix: resolve unsigned int initialization warning for n_dims/size in gguf.cpp (#15754)
|
4 months ago |
Oliver Simons
|
69db8a52e6
chore: Update `.clang-format` to use `BinPackArguments=true` (#15744)
|
4 months ago |
Johannes Gäßler
|
c466abe158
llama: -fa 1/0/-1 aliases for -fa on/off/auto (#15746)
|
4 months ago |
Ruben Ortlam
|
0a2a3841e8
vulkan: fix shaders gen when no integer dot is available (#15740)
|
4 months ago |
hipudding
|
9961d244f2
CANN: Resolve soft_max precision issue (#15730)
|
4 months ago |
Jeff Bolz
|
25f1045f07
vulkan: Fix macro parameter order for f32 matmul shaders (#15716)
|
4 months ago |
rmatif
|
97669e4073
opencl: add attn sinks support for FA kernels (#15706)
|
4 months ago |
Chenguang Li
|
2f853687b3
CANN: Support eager execution mode under ACL graph compilation (#15712)
|
4 months ago |
hipudding
|
ef2af57ddf
CANN: Support ext_factor in rope (#15710)
|
4 months ago |
Johannes Gäßler
|
5d804a4938
ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722)
|
4 months ago |
Gilad S.
|
d4d8dbe383
vulkan: use memory budget extension to read memory usage (#15545)
|
4 months ago |
Jeff Bolz
|
35a42edac8
vulkan: add missing clamps in new mul_mat_id paths (#15702)
|
4 months ago |
Ruben Ortlam
|
fec7911f8f
vulkan: disable large mmv subgroups on older Nvidia GPUs (#15717)
|
4 months ago |
s-goto-11
|
078ce23ea7
ggml: SVE support for exponential functions (#15145)
|
4 months ago |
Prashant Vithule
|
a0c2b207c5
ggml: aarch64: Implement SVE F16 kernels for vector functions (#15115)
|
4 months ago |
Jie Fu (傅杰)
|
4b20d8b7e3
convert : remove redundant code (#15708)
|
4 months ago |
Ruben Ortlam
|
02c1813517
Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants (#14903)
|
4 months ago |
Daniel Bevenius
|
77dee9de97
ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695)
|
4 months ago |
Jie Fu (傅杰)
|
4795c91c32
docs : add Hunyuan to models section (#15707)
|
4 months ago |
Akarshan Biswas
|
b66df9d9c9
CUDA: fix build error from ambiguous __half conversions in conv2d (#15690)
|
4 months ago |
hipudding
|
b9382c3877
CANN: Optimize MUL_MAT_ID (#15658)
|
4 months ago |
hipudding
|
3dc7397a27
CANN: fix RoPE cache issue on multi-device (#15629)
|
4 months ago |
Georgi Gerganov
|
e92d53b29e
sampling : optimize samplers by reusing bucket sort (#15665)
|
4 months ago |
Georgi Gerganov
|
0d161f021a
server : enable /slots by default and make it secure (#15630)
|
4 months ago |
Georgi Gerganov
|
4efd5a8316
metal : fix checks for available FA kernels (#15700)
|
4 months ago |
Diego Devesa
|
274966226f
llama : fix fattn reserve call n_seqs parameter (#15699)
|
4 months ago |
Diego Devesa
|
9777032dcc
llama : separate compute buffer reserve from fattn check (#15696)
|
4 months ago |