Aman Gupta
|
f47edb8c19
ggml-cuda: check for srcs outside the cgraph (#18583)
|
3 недель назад |
Vladislav Sayapin
|
da143b9940
server : fix router child env in containerized environments (#18562)
|
3 недель назад |
Jeff Bolz
|
f1768d8f03
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (#18582)
|
3 недель назад |
Georgi Gerganov
|
2da64a2f8a
models : fix backend assignment for Granite/Nemotron graphs (#18599)
|
3 недель назад |
Jeff Bolz
|
b37124d2d2
vulkan: handle quantize_q8_1 overflowing the max workgroup count (#18515)
|
3 недель назад |
Sigbjørn Skjæret
|
eadc4184ca
llama : refactor rope_freq_base/scale_swa conversion and init (#18553)
|
3 недель назад |
Chenguang Li
|
67e3f6f601
CANN: add operator fusion support for ADD + RMS_NORM (#17512)
|
3 недель назад |
Francisco Herrera
|
92ac1e016b
doc: clarify that steps also apply to linux for opencl (#18002)
|
3 недель назад |
Ali Tariq
|
8e3a761189
ci : init git lfs in every build for RISC-V (#18590)
|
3 недель назад |
Daniel Bevenius
|
d3dce4e0a5
sampling : add support for backend sampling (#17004)
|
3 недель назад |
Tarek Dakhran
|
4974bf53cf
model : mtmd : make input norm optional in LFM2-VL (#18594)
|
3 недель назад |
Aman Gupta
|
908a9e5a1e
CUDA: disable cuda graph when using n-cpu-moe (#18593)
|
3 недель назад |
Aman Gupta
|
5126c41c1c
ggml-cuda: remove unused params in ggml_cuda_graph (#18579)
|
3 недель назад |
Aldehir Rojas
|
cef1d23c5a
common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342)
|
3 недель назад |
Georgi Gerganov
|
c69c7ebc90
graph : fix graph reuse logic when `n_pos_per_embd > 1` (#18566)
|
3 недель назад |
Aman Gupta
|
e57f52334b
ggml-cuda: fixes for concurrent streams (#18496)
|
3 недель назад |
Georgi Gerganov
|
a554a1ecc7
context : fix reserve token padding to n_seqs (#18536)
|
3 недель назад |
Johannes Gäßler
|
0f2e42ca1d
CUDA: only allocate FA tmp buffer if needed (#18564)
|
3 недель назад |
pl752
|
9dba9f5352
(Bugfix, ggml-cuda) Pool alloc count fix + small size computation type adjustment (#18559)
|
3 недель назад |
Shouyu
|
bcfc8c3cec
ggml-hexagon: optimize activation function (#18393)
|
3 недель назад |
Jeff Bolz
|
18ddaea2ae
vulkan: Optimize GGML_OP_CUMSUM (#18417)
|
3 недель назад |
Jeff Bolz
|
706e3f93a6
vulkan: Implement mmvq for iq1_s/iq1_m (#18450)
|
3 недель назад |
Prabod
|
5755e52d15
model : Maincoder-1B support (#18534)
|
3 недель назад |
Georgi Gerganov
|
f38de16341
metal : adjust extra size for FA buffer to avoid reallocations (#18545)
|
3 недель назад |
Georgi Gerganov
|
af1e8e1a6c
graph : reduce topology branching (#18548)
|
3 недель назад |
Georgi Gerganov
|
d84a6a98be
vocab : reduce debug logs about non-EOG control tokens (#18541)
|
3 недель назад |
Chris Rohlf
|
c6f0e832da
rpc : use unordered_map::reserve and emplace (#18513)
|
3 недель назад |
MeeMin
|
e86f3c2221
cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion) (#18433)
|
3 недель назад |
Sigbjørn Skjæret
|
169ee68ffb
model : remove modern-bert iswa template (#18529)
|
3 недель назад |
tt
|
ced765be44
model: support youtu-vl model (#18479)
|
3 недель назад |