Sigbjørn Skjæret
|
6385b843a8
llama : add RobertaForSequenceClassification reranker support (#13875)
|
7 months ago |
Piotr Jasiukajtis
|
4032ca4066
llama : add support for Qwen3 MoE tied word embeddings (#13768)
|
8 months ago |
Georgi Gerganov
|
d13d0f6135
hparams : initialize arrays (#13728)
|
8 months ago |
Xuan-Son Nguyen
|
8a2afb7520
llama : allow custom list of swa_layers (#13726)
|
8 months ago |
Georgi Gerganov
|
8a1d206f1d
tts : fix n_ubatch + make WavTokenizer cache-less (#13713)
|
8 months ago |
Georgi Gerganov
|
797f2ac062
kv-cache : simplify the interface (#13660)
|
8 months ago |
Georgi Gerganov
|
b44890df2e
model : disable SWA for Phi models (#13676)
|
8 months ago |
Georgi Gerganov
|
be0239693c
model : fix llama4 graph (#13663)
|
8 months ago |
Georgi Gerganov
|
e298d2fbd0
kv-cache : add SWA support (#13194)
|
8 months ago |
Gabe Goodhart
|
5e7d95e22e
fix: Move build_inp_pos to the top of the graph section for build_granite (#13538)
|
8 months ago |
Gabe Goodhart
|
d590cd4c24
model : Granite MoE shared (#13269)
|
8 months ago |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
8 months ago |
Diego Devesa
|
27ebfcacba
llama : do not crash if there is no CPU backend (#13395)
|
8 months ago |
Xuan-Son Nguyen
|
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 (#13398)
|
8 months ago |
Georgi Gerganov
|
6562e5a4d6
context : allow cache-less context for embeddings (#13108)
|
8 months ago |
Diego Devesa
|
f061021206
llama : print size and type of overridden tensors (#13364)
|
8 months ago |
Sigbjørn Skjæret
|
bc4e1128f7
llama : deci : support ffn-free with attention (#13296)
|
8 months ago |
piDack
|
6c7fd67b64
llama : support tie embedding for chatglm models (#13328)
|
8 months ago |
ymcki
|
3bf785f3ef
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)
|
8 months ago |
Jared Van Bortel
|
2f567611c0
llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245)
|
8 months ago |
Georgi Gerganov
|
c642bc014c
kv-cache : separate recurrent vs non-recurrent impl (#12799)
|
8 months ago |
Sigbjørn Skjæret
|
cb06a3c363
llama : orion rope type is neox (#13261)
|
8 months ago |
Sigbjørn Skjæret
|
626083faf7
llama : plamo rope type is neox (#13260)
|
8 months ago |
Jared Van Bortel
|
a70183eb00
llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223)
|
8 months ago |
Johannes Gäßler
|
cdf76586b2
CUDA: fix non-cont. inputs for batched mat mul (#13155)
|
8 months ago |
Sigbjørn Skjæret
|
7d3af70b08
llama : llm_type order by size (#13177)
|
8 months ago |
Sigbjørn Skjæret
|
e98b3692be
llama : set qwen3 model type sizes (#13175)
|
8 months ago |
AT
|
5f5e39e1ba
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
|
8 months ago |
Johannes Gäßler
|
69699be48a
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)
|
8 months ago |
Georgi Gerganov
|
2f74c354c0
graph : make FA compatible with MLA + add initial Metal kernels (#12953)
|
9 months ago |