Piotr Wilkin
|
32dcee47ef
Some attempts to get the convolution input right.
|
3 месяцев назад |
Piotr Wilkin
|
7bedf4c66c
Refactor llama-model.cpp
|
3 месяцев назад |
Piotr Wilkin
|
9014feadfa
Change RoPE to NeoX
|
3 месяцев назад |
Piotr Wilkin
|
f020baa466
Normal attention: apply gate before output
|
3 месяцев назад |
Piotr Wilkin
|
27fa5f335d
Correct convolution state dimension calculations
|
3 месяцев назад |
Piotr Wilkin
|
e24c9dfa60
Remove OP_DELTA_NET, fix flake8 and editorchecker because why not
|
3 месяцев назад |
Piotr Wilkin
|
43eb7a7757
Now that eval's running move delta net stuff back to llama-model, add cbs
|
3 месяцев назад |
Piotr Wilkin
|
890fa2c1e3
WE HAVE OUTPUT!
|
3 месяцев назад |
Piotr Wilkin
|
e590a75905
Cleanup complete, now for the recurrent memory management...
|
3 месяцев назад |
Piotr Wilkin (ilintar)
|
72c98b0c7d
Merge pull request #1 from ggml-org/xsn/qwen3next_experiment
|
3 месяцев назад |
Xuan Son Nguyen
|
e83ef74733
one less magic number
|
4 месяцев назад |
Xuan Son Nguyen
|
f643b957f4
refactor softplus fn
|
4 месяцев назад |
Xuan Son Nguyen
|
46110e0630
split q_proj/gate
|
4 месяцев назад |
Piotr Wilkin
|
8152df60f3
Getting closer (graph builds for bs=1 but tensor shaping is still wrong for bigger sizes)
|
4 месяцев назад |
Piotr Wilkin
|
e0c5dff2a7
Rewrite to tensor ops
|
4 месяцев назад |
Piotr Wilkin
|
178230ee21
Getting to decode stage...
|
4 месяцев назад |
Piotr Wilkin (ilintar)
|
c78f9fce68
Merge branch 'ggml-org:master' into qwen3_next
|
4 месяцев назад |
Piotr Wilkin
|
344331c2b6
First draft
|
4 месяцев назад |
Xuan-Son Nguyen
|
8f8f2274ee
convert : add Llama4ForCausalLM (#16042)
|
4 месяцев назад |
Shane A
|
85286f3548
model : add OLMo3 support (#16015)
|
4 месяцев назад |
Aman Gupta
|
6d758839ff
Add LLaDA-7b-MoE diffusion model (#16003)
|
4 месяцев назад |
Sigbjørn Skjæret
|
b8e09f08b9
model : add grok-2 support (#15539)
|
4 месяцев назад |
Jie Fu (傅杰)
|
4f658855fa
llama : support T5 models with unequal number of encoder-decoder layers (#15909)
|
4 месяцев назад |
Georgi Gerganov
|
cf0e3ba150
model : avoid ggml_cont_3d for fused QKV weights (#15662)
|
4 месяцев назад |
Georgi Gerganov
|
c610b6c11b
kv-cache : fix SWA checks + disable cacheless iSWA (#15811)
|
4 месяцев назад |
Daniel Bevenius
|
fb15d649ed
llama : add support for EmbeddingGemma 300m (#15798)
|
4 месяцев назад |
Daniel Bevenius
|
2c8dac72eb
llama : fix incorrect model type for Gemma 270M (#15764)
|
4 месяцев назад |
Johannes Gäßler
|
e81b8e4b7f
llama: use FA + max. GPU layers by default (#15434)
|
4 месяцев назад |
Gabe Goodhart
|
e8d99dd0b6
nvidia nemotron nano v2 (nemotronh) (#15507)
|
4 месяцев назад |
Sigbjørn Skjæret
|
84ab83cc0b
model : jina-embeddings-v3 support (#13693)
|
4 месяцев назад |