Georgi Gerganov
|
c560316440
graph : reuse SSM graphs (#16490)
|
1 month ago |
Daniel Bevenius
|
2995341730
llama : add support for NVIDIA Nemotron 3 Nano (#18058)
|
1 month ago |
Xuan-Son Nguyen
|
0759b09c90
graph: add f_attn_temp_offset (#18025)
|
1 month ago |
Georgi Gerganov
|
609a2d0268
models : fix YaRN regression + consolidate logic (#18006)
|
1 month ago |
Georgi Gerganov
|
7bed317f53
models : fix the attn_factor for mistral3 graphs + improve consistency (#17945)
|
1 month ago |
Georgi Gerganov
|
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
|
1 month ago |
Sigbjørn Skjæret
|
c8554b66e0
graph : use fill instead of scale_bias in grouped expert selection (#17867)
|
1 month ago |
Xuan-Son Nguyen
|
cd3c118908
model: support Ministral3 (#17644)
|
2 months ago |
Aman Gupta
|
6eea666912
llama-graph: avoid expand_forward for fusion (#17633)
|
2 months ago |
Georgi Gerganov
|
583cb83416
ggml : add ggml_top_k (#17365)
|
2 months ago |
Aman Gupta
|
a90eb94ca9
CUDA: fuse rope + set_rows (#16884)
|
2 months ago |
Sigbjørn Skjæret
|
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
|
2 months ago |
Jan Boon
|
d7395115ba
llama : use std::abs instead of abs (#16853)
|
3 months ago |
Sigbjørn Skjæret
|
f696428ce8
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655)
|
3 months ago |
Aman Gupta
|
f77c13b91f
CUDA: General GEMV fusion (#16715)
|
3 months ago |
Sigbjørn Skjæret
|
84bf3c6778
model : add BailingMoeV2 support (#16063)
|
3 months ago |
Georgi Gerganov
|
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
|
3 months ago |
Georgi Gerganov
|
e38b7c6e9e
graph : support cacheless embeddings with FA and iSWA (#16528)
|
3 months ago |
Saba Fallah
|
e08db42595
model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (#16367)
|
3 months ago |
Sigbjørn Skjæret
|
835b2b915c
model : add GroveMoE support (#15510)
|
4 months ago |
Aman Gupta
|
077c94d0ca
CUDA: add a fused top-K MoE kernel (#16130)
|
4 months ago |
Douglas Hanley
|
b5bd037832
llama : add support for qwen3 reranker (#15824)
|
4 months ago |
Sigbjørn Skjæret
|
b8e09f08b9
model : add grok-2 support (#15539)
|
4 months ago |
Sigbjørn Skjæret
|
6ab397e12b
graph : support non-contiguous Q in build_attn_mha (#15908)
|
4 months ago |
Georgi Gerganov
|
663027fd54
context : fix n_outputs during reserve (#15858)
|
4 months ago |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
4 months ago |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
4 months ago |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
5 months ago |
Georgi Gerganov
|
8a4280ce43
kv-cache : remove LLAMA_SET_ROWS checks (#15505)
|
5 months ago |
Georgi Gerganov
|
0373486dbc
graph : fix assert in memory-less build_attn (#15590)
|
5 months ago |