hipudding
|
6ba6a3c76f
docs : update ops.md for CANN backend (#18654)
|
1 week ago |
Perry Naseck
|
0802d4cfb3
ggml-blas: hide warnings from included BLAS headers (#18818)
|
1 week ago |
Tarek Dakhran
|
c945aaaef2
mtmd : Fix ASR for LFM2.5-Audio-1.5B (#18876)
|
1 week ago |
Xuan-Son Nguyen
|
c15395f73c
common : implement new jinja template engine (#18462)
|
1 week ago |
Julius Tischbein
|
aa1dc3770a
Setting mmap and direct_io to false as default in llama-bench.cpp (#18841)
|
1 week ago |
Raul Torres
|
4ea2eaac01
CANN: Remove unused `ggml_cann_get_device` function (#18625)
|
1 week ago |
Chenguang Li
|
e20fa27a02
CANN: fix an issue where get_env was not fully renamed (#18796)
|
1 week ago |
hipudding
|
baa4ba0aec
CANN: support gated linear attn (#18653)
|
1 week ago |
shaofeiqi
|
785a710085
OpenCL: add SOLVE_TRI op support (#18846)
|
2 weeks ago |
Georgi Gerganov
|
6e7fc8a146
cuda : print less debug logs when disabling cuda graphs (#18868)
|
2 weeks ago |
Georgi Gerganov
|
be8e3d9515
context : do not reserve scheduler for warmups (#18867)
|
2 weeks ago |
ddh0
|
13f1e4a9ca
llama : add adaptive-p sampler (#17927)
|
2 weeks ago |
Xuan-Son Nguyen
|
a04c2b06a3
server: improve slots scheduling for n_cmpl (#18789)
|
2 weeks ago |
Georgi Gerganov
|
39173bcacb
context : reserve new scheduler when graph topology changes (#18547)
|
2 weeks ago |
Johannes Gäßler
|
5c662d21a3
CUDA: fix allignment on register spill for FA (#18815)
|
2 weeks ago |
shalinib-ibm
|
8cc0ba957b
ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837)
|
2 weeks ago |
Xuan-Son Nguyen
|
a7e6ddb8bd
lora: make sure model keep track of associated adapters (#18490)
|
2 weeks ago |
Sigbjørn Skjæret
|
2a13180100
model-loader : support bool array sliding window pattern (#18850)
|
2 weeks ago |
Adrien Gallouët
|
ec997b4f2b
tests : download models only when running ctest (#18843)
|
2 weeks ago |
Max Krasnyansky
|
cff777f226
hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822)
|
2 weeks ago |
Oliver Simons
|
36f0132464
CUDA: Factor out and re-use `block_reduce` function (#18785)
|
2 weeks ago |
Piotr Wilkin (ilintar)
|
d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914)
|
2 weeks ago |
Junwon Hwang
|
8fb7175576
model : clean up and fix EXAONE-MoE configuration (#18840)
|
2 weeks ago |
Adrien Gallouët
|
516a4ca9b5
refactor : remove libcurl, use OpenSSL when available (#18828)
|
2 weeks ago |
Jeff Bolz
|
3e4bb29666
vulkan: Check maxStorageBufferRange in supports_op (#18709)
|
2 weeks ago |
Aman Gupta
|
47f9612492
llama-model: fix unfortunate typo (#18832)
|
2 weeks ago |
Daniel Bevenius
|
01cbdfd7eb
CUDA : fix typo in clang pragma comment [no ci] (#18830)
|
2 weeks ago |
Ruben Ortlam
|
635ef78ec5
vulkan: work around Intel fp16 bug in mmq (#18814)
|
2 weeks ago |
Perry Naseck
|
7d587e5544
ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705)
|
2 weeks ago |
Daniel Benjaminsson
|
d34aa07193
mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819)
|
2 weeks ago |