Xuan-Son Nguyen
|
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 (#13398)
|
8 месяцев назад |
Georgi Gerganov
|
6562e5a4d6
context : allow cache-less context for embeddings (#13108)
|
8 месяцев назад |
Diego Devesa
|
f061021206
llama : print size and type of overridden tensors (#13364)
|
8 месяцев назад |
Sigbjørn Skjæret
|
bc4e1128f7
llama : deci : support ffn-free with attention (#13296)
|
8 месяцев назад |
piDack
|
6c7fd67b64
llama : support tie embedding for chatglm models (#13328)
|
8 месяцев назад |
ymcki
|
3bf785f3ef
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)
|
8 месяцев назад |
Jared Van Bortel
|
2f567611c0
llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245)
|
8 месяцев назад |
Georgi Gerganov
|
c642bc014c
kv-cache : separate recurrent vs non-recurrent impl (#12799)
|
8 месяцев назад |
Sigbjørn Skjæret
|
cb06a3c363
llama : orion rope type is neox (#13261)
|
8 месяцев назад |
Sigbjørn Skjæret
|
626083faf7
llama : plamo rope type is neox (#13260)
|
8 месяцев назад |
Jared Van Bortel
|
a70183eb00
llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223)
|
8 месяцев назад |
Johannes Gäßler
|
cdf76586b2
CUDA: fix non-cont. inputs for batched mat mul (#13155)
|
8 месяцев назад |
Sigbjørn Skjæret
|
7d3af70b08
llama : llm_type order by size (#13177)
|
8 месяцев назад |
Sigbjørn Skjæret
|
e98b3692be
llama : set qwen3 model type sizes (#13175)
|
8 месяцев назад |
AT
|
5f5e39e1ba
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
|
8 месяцев назад |
Johannes Gäßler
|
69699be48a
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)
|
8 месяцев назад |
Georgi Gerganov
|
2f74c354c0
graph : make FA compatible with MLA + add initial Metal kernels (#12953)
|
9 месяцев назад |
Juk Armstrong
|
daa422881a
llama : DeepSeek V2/V3 MLA implementation (#12801)
|
9 месяцев назад |
Yuxuan Zhang
|
06bb53ad9b
llama-model : add Glm4Model implementation for GLM-4-0414 (#12867)
|
9 месяцев назад |
Xuan-Son Nguyen
|
8b91d5355a
llama : correct rms norm for llama 4 (#12882)
|
9 месяцев назад |
Bo Zheng
|
d3bd7193ba
llama : Support Qwen3 and Qwen3MoE (#12828)
|
9 месяцев назад |
Xuan-Son Nguyen
|
1466621e73
llama : Support llama 4 text-only (#12791)
|
9 месяцев назад |
Diego Devesa
|
e0e912f49b
llama : add option to override model tensor buffers (#11397)
|
9 месяцев назад |
Sigbjørn Skjæret
|
2c3f8b850a
llama : support BailingMoE (Ling) (#12634)
|
9 месяцев назад |
Djip007
|
0bb2919335
llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU (#12632)
|
9 месяцев назад |
Sigbjørn Skjæret
|
3714c3ee1a
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631)
|
9 месяцев назад |
Si1w
|
f125b8dccf
llama : add PLM GGUF Conversion & Inference Support (#12457)
|
9 месяцев назад |
HighDoping
|
953c2a62cf
model : restore support for T5Encoder (#12590)
|
9 месяцев назад |
Xuan-Son Nguyen
|
fbdfefe74e
llama : gemma3 : use output tensor if it exists in model weight (#12506)
|
10 месяцев назад |
Georgi Gerganov
|
af04481e6b
model : do not repack if a GPU device is present (#12498)
|
10 месяцев назад |