Piotr Wilkin
|
c2a82a1773
Move the norm shift to conversion, Gemma 2 style
|
3 месяцев назад |
Piotr Wilkin
|
5306640300
All's well that ends in a well
|
3 месяцев назад |
Piotr Wilkin
|
232ec56251
Yes, I finally managed to implement it with ssm_conv :>
|
3 месяцев назад |
Piotr Wilkin
|
aa8d6a21a3
Remove extra files cont.
|
3 месяцев назад |
Piotr Wilkin
|
e9a98f2af9
Remove extra files
|
3 месяцев назад |
Piotr Wilkin
|
22ee5a971b
Add gate_sigmoid to callback
|
3 месяцев назад |
Piotr Wilkin
|
ce87b7d78e
Yup, it's NeoX
|
3 месяцев назад |
Piotr Wilkin
|
df0b5bcf30
Proper order of attention operations
|
3 месяцев назад |
Piotr Wilkin
|
54712b8664
Oh, forgot to commit
|
3 месяцев назад |
Piotr Wilkin
|
17240eafc0
Order stuff around
|
3 месяцев назад |
Piotr Wilkin
|
1579bcb202
What am I missing? :/
|
3 месяцев назад |
Piotr Wilkin
|
0a9244acd0
The optimization worked even too well ;)
|
3 месяцев назад |
Piotr Wilkin
|
8ddaf251ae
Fix some state regressions... still wip
|
3 месяцев назад |
Piotr Wilkin
|
6942c85cf8
Oh, actually set n_tasks as well :P
|
3 месяцев назад |
Piotr Wilkin
|
477c1616ad
Parallelize delta_net
|
3 месяцев назад |
Piotr Wilkin
|
4ef6f337de
Proper multi-sequence convolution calculation, corrected (?) state management
|
3 месяцев назад |
Piotr Wilkin
|
5f5e30007c
Dilution n_seqs -> 1
|
3 месяцев назад |
Piotr Wilkin
|
eb0a15fc9b
n_tokens -> n_seq_tokens
|
3 месяцев назад |
Piotr Wilkin
|
ee52fe36f3
Modify sanity check to handle hybrid models
|
3 месяцев назад |
Piotr Wilkin
|
0dd6110fdc
v1.0
|
3 месяцев назад |
Piotr Wilkin
|
adcbd9428f
Linear layer output convergence
|
3 месяцев назад |
Piotr Wilkin
|
666fc0583d
Parity on delta!
|
3 месяцев назад |
Piotr Wilkin
|
a2c7b6794e
Proper handling for n_tokens > GGML_DELTA_NET_CHUNK
|
3 месяцев назад |
Piotr Wilkin
|
c1e46f62fa
Achieve pre-chunk-attention parity; remove most of the LLM generated crap
|
3 месяцев назад |
Piotr Wilkin
|
c87e8d550c
Tensor preparation for delta_net complete
|
3 месяцев назад |
Piotr Wilkin
|
7ec2df64a4
Added: tri, cumsum. Still a mess.
|
3 месяцев назад |
Piotr Wilkin
|
6d0ad37cf4
Fix QKV extraction post-convolution
|
3 месяцев назад |
Piotr Wilkin
|
845a3d7166
Convolution
|
3 месяцев назад |
Piotr Wilkin
|
638057a29b
Transpose input for convolution
|
3 месяцев назад |
Piotr Wilkin
|
835d389fc5
Fix BA views as well
|
3 месяцев назад |