Piotr Wilkin
|
54bb6f1eb9
argh again
|
3 months ago |
Piotr Wilkin
|
20424d8785
argh
|
3 months ago |
Piotr Wilkin
|
413652178f
attempt 2
|
3 months ago |
Piotr Wilkin
|
c5dc442a5d
repeat_interleave
|
3 months ago |
Piotr Wilkin
|
7eef0bd948
Rewrite recurrent delta + softmax to separate ops
|
3 months ago |
Piotr Wilkin
|
d300ce9eba
Hmmmm......
|
3 months ago |
Piotr Wilkin
|
3f5994223b
Hmm...
|
3 months ago |
Piotr Wilkin
|
7348546b5e
Missing cont()
|
3 months ago |
Piotr Wilkin
|
5a161d9461
Remove unnecessary transposes/reshapes
|
3 months ago |
Piotr Wilkin
|
572864287e
Handle case with more than one token per seq with elegant loop plus completely not crazy change to max nodes ;)
|
3 months ago |
Piotr Wilkin
|
c2a82a1773
Move the norm shift to conversion, Gemma 2 style
|
3 months ago |
Piotr Wilkin
|
5306640300
All's well that ends in a well
|
3 months ago |
Piotr Wilkin
|
232ec56251
Yes, I finally managed to implement it with ssm_conv :>
|
3 months ago |
Piotr Wilkin
|
22ee5a971b
Add gate_sigmoid to callback
|
3 months ago |
Piotr Wilkin
|
df0b5bcf30
Proper order of attention operations
|
3 months ago |
Piotr Wilkin
|
17240eafc0
Order stuff around
|
3 months ago |
Piotr Wilkin
|
1579bcb202
What am I missing? :/
|
3 months ago |
Piotr Wilkin
|
8ddaf251ae
Fix some state regressions... still wip
|
3 months ago |
Piotr Wilkin
|
4ef6f337de
Proper multi-sequence convolution calculation, corrected (?) state management
|
3 months ago |
Piotr Wilkin
|
5f5e30007c
Dilution n_seqs -> 1
|
3 months ago |
Piotr Wilkin
|
eb0a15fc9b
n_tokens -> n_seq_tokens
|
3 months ago |
Piotr Wilkin
|
0dd6110fdc
v1.0
|
3 months ago |
Piotr Wilkin
|
adcbd9428f
Linear layer output convergence
|
3 months ago |
Piotr Wilkin
|
666fc0583d
Parity on delta!
|
3 months ago |
Piotr Wilkin
|
a2c7b6794e
Proper handling for n_tokens > GGML_DELTA_NET_CHUNK
|
3 months ago |
Piotr Wilkin
|
c1e46f62fa
Achieve pre-chunk-attention parity; remove most of the LLM generated crap
|
3 months ago |
Piotr Wilkin
|
c87e8d550c
Tensor preparation for delta_net complete
|
3 months ago |
Piotr Wilkin
|
7ec2df64a4
Added: tri, cumsum. Still a mess.
|
3 months ago |
Piotr Wilkin
|
6d0ad37cf4
Fix QKV extraction post-convolution
|
4 months ago |
Piotr Wilkin
|
845a3d7166
Convolution
|
4 months ago |