Sigbjørn Skjæret
|
835b2b915c
model : add GroveMoE support (#15510)
|
3 месяцев назад |
Aman Gupta
|
077c94d0ca
CUDA: add a fused top-K MoE kernel (#16130)
|
3 месяцев назад |
Douglas Hanley
|
b5bd037832
llama : add support for qwen3 reranker (#15824)
|
3 месяцев назад |
Sigbjørn Skjæret
|
b8e09f08b9
model : add grok-2 support (#15539)
|
4 месяцев назад |
Sigbjørn Skjæret
|
6ab397e12b
graph : support non-contiguous Q in build_attn_mha (#15908)
|
4 месяцев назад |
Georgi Gerganov
|
663027fd54
context : fix n_outputs during reserve (#15858)
|
4 месяцев назад |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
4 месяцев назад |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
4 месяцев назад |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
4 месяцев назад |
Georgi Gerganov
|
8a4280ce43
kv-cache : remove LLAMA_SET_ROWS checks (#15505)
|
4 месяцев назад |
Georgi Gerganov
|
0373486dbc
graph : fix assert in memory-less build_attn (#15590)
|
4 месяцев назад |
Georgi Gerganov
|
3f196be84b
graph : remove build_attn_with_sinks overload (#15469)
|
4 месяцев назад |
Georgi Gerganov
|
715a6db02c
kv-cache : drop the "unified" prefix (#15467)
|
4 месяцев назад |
Georgi Gerganov
|
fd1234cb46
llama : add gpt-oss (#15091)
|
5 месяцев назад |
Sam
|
ef0144c087
model: support GLM 4.5 family of models (#14939)
|
5 месяцев назад |
Dongliang Wei
|
c1dacaa99b
llama : merge build_moe_ffn_from_probs function into build_moe_ffn (#14968)
|
5 месяцев назад |
compilade
|
66625a59a5
graph : reduce splits for recurrent and hybrid models (#14825)
|
5 месяцев назад |
Douglas Hanley
|
a118d80233
embeddings: fix extraction of CLS pooling results (#14927)
|
5 месяцев назад |
Dongliang Wei
|
6c6e397aff
model : add support for SmallThinker series (#14898)
|
5 месяцев назад |
Georgi Gerganov
|
bf9087f59a
metal : fuse add, mul + add tests (#14596)
|
6 месяцев назад |
Georgi Gerganov
|
9fb1042ce6
graph : fix graph reuse reset of params (#14760)
|
6 месяцев назад |
Georgi Gerganov
|
d498af3d5a
graph : avoid huge warm-up graphs for MoE models (#14753)
|
6 месяцев назад |
Georgi Gerganov
|
8f974bc1e9
graph : refactor context to not pass gf explicitly (#14629)
|
6 месяцев назад |
Nexes the Elder
|
09651d09ff
graph : Pass the graph placeholder message in debug mode (#14748)
|
6 месяцев назад |
Georgi Gerganov
|
01612b7409
llama : reuse compute graphs (#14482)
|
6 месяцев назад |
Georgi Gerganov
|
225e7a1438
llama : add high-throughput mode (#14363)
|
6 месяцев назад |
Xuan-Son Nguyen
|
cb9178f885
llama : remove llm_graph_input_one (#14603)
|
6 месяцев назад |
compilade
|
4a5686da22
llama : support Jamba hybrid Transformer-Mamba models (#7531)
|
6 месяцев назад |
Georgi Gerganov
|
7b50f7c025
graph : prepare for 4D mask (#14515)
|
6 месяцев назад |
Georgi Gerganov
|
a70c8a0c4b
kv-cache : use ggml_set_rows (#14285)
|
6 месяцев назад |