Sigbjørn Skjæret
|
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
|
2 miesięcy temu |
Jan Boon
|
d7395115ba
llama : use std::abs instead of abs (#16853)
|
2 miesięcy temu |
Sigbjørn Skjæret
|
f696428ce8
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655)
|
2 miesięcy temu |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
2 miesięcy temu |
Sigbjørn Skjæret
|
84bf3c6778
model : add BailingMoeV2 support (#16063)
|
2 miesięcy temu |
Georgi Gerganov
|
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
|
3 miesięcy temu |
Georgi Gerganov
|
e38b7c6e9e
graph : support cacheless embeddings with FA and iSWA (#16528)
|
3 miesięcy temu |
Saba Fallah
|
e08db42595
model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (#16367)
|
3 miesięcy temu |
Sigbjørn Skjæret
|
835b2b915c
model : add GroveMoE support (#15510)
|
3 miesięcy temu |
Aman Gupta
|
077c94d0ca
CUDA: add a fused top-K MoE kernel (#16130)
|
3 miesięcy temu |
Douglas Hanley
|
b5bd037832
llama : add support for qwen3 reranker (#15824)
|
3 miesięcy temu |
Sigbjørn Skjæret
|
b8e09f08b9
model : add grok-2 support (#15539)
|
4 miesięcy temu |
Sigbjørn Skjæret
|
6ab397e12b
graph : support non-contiguous Q in build_attn_mha (#15908)
|
4 miesięcy temu |
Georgi Gerganov
|
663027fd54
context : fix n_outputs during reserve (#15858)
|
4 miesięcy temu |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
4 miesięcy temu |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
4 miesięcy temu |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
4 miesięcy temu |
Georgi Gerganov
|
8a4280ce43
kv-cache : remove LLAMA_SET_ROWS checks (#15505)
|
4 miesięcy temu |
Georgi Gerganov
|
0373486dbc
graph : fix assert in memory-less build_attn (#15590)
|
4 miesięcy temu |
Georgi Gerganov
|
3f196be84b
graph : remove build_attn_with_sinks overload (#15469)
|
4 miesięcy temu |
Georgi Gerganov
|
715a6db02c
kv-cache : drop the "unified" prefix (#15467)
|
5 miesięcy temu |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
5 miesięcy temu |
Sam
|
ef0144c087
model: support GLM 4.5 family of models (#14939)
|
5 miesięcy temu |
Dongliang Wei
|
c1dacaa99b
llama : merge build_moe_ffn_from_probs function into build_moe_ffn (#14968)
|
5 miesięcy temu |
compilade
|
66625a59a5
graph : reduce splits for recurrent and hybrid models (#14825)
|
5 miesięcy temu |
Douglas Hanley
|
a118d80233
embeddings: fix extraction of CLS pooling results (#14927)
|
5 miesięcy temu |
Dongliang Wei
|
6c6e397aff
model : add support for SmallThinker series (#14898)
|
5 miesięcy temu |
Georgi Gerganov
|
bf9087f59a
metal : fuse add, mul + add tests (#14596)
|
6 miesięcy temu |
Georgi Gerganov
|
9fb1042ce6
graph : fix graph reuse reset of params (#14760)
|
6 miesięcy temu |
Georgi Gerganov
|
d498af3d5a
graph : avoid huge warm-up graphs for MoE models (#14753)
|
6 miesięcy temu |