Piotr Wilkin
|
0b301889bf
Stabilize tensor dump trigger for now with -n < 50
|
3 mēneši atpakaļ |
Piotr Wilkin
|
f0a07c1091
Add proper backend tensor printing, use double for accumulating the sum
|
3 mēneši atpakaļ |
Piotr Wilkin
|
4c8771d200
Print 5D tensors
|
3 mēneši atpakaļ |
Piotr Wilkin
|
10032affcf
More debug data
|
3 mēneši atpakaļ |
Piotr Wilkin
|
d300ce9eba
Hmmmm......
|
3 mēneši atpakaļ |
Piotr Wilkin
|
3f5994223b
Hmm...
|
3 mēneši atpakaļ |
Piotr Wilkin
|
7348546b5e
Missing cont()
|
3 mēneši atpakaļ |
Piotr Wilkin
|
5a161d9461
Remove unnecessary transposes/reshapes
|
3 mēneši atpakaļ |
Piotr Wilkin
|
572864287e
Handle case with more than one token per seq with elegant loop plus completely not crazy change to max nodes ;)
|
3 mēneši atpakaļ |
Piotr Wilkin
|
c2a82a1773
Move the norm shift to conversion, Gemma 2 style
|
3 mēneši atpakaļ |
Piotr Wilkin
|
5306640300
All's well that ends in a well
|
3 mēneši atpakaļ |
Piotr Wilkin
|
232ec56251
Yes, I finally managed to implement it with ssm_conv :>
|
3 mēneši atpakaļ |
Piotr Wilkin
|
aa8d6a21a3
Remove extra files cont.
|
3 mēneši atpakaļ |
Piotr Wilkin
|
e9a98f2af9
Remove extra files
|
3 mēneši atpakaļ |
Piotr Wilkin
|
22ee5a971b
Add gate_sigmoid to callback
|
3 mēneši atpakaļ |
Piotr Wilkin
|
ce87b7d78e
Yup, it's NeoX
|
3 mēneši atpakaļ |
Piotr Wilkin
|
df0b5bcf30
Proper order of attention operations
|
3 mēneši atpakaļ |
Piotr Wilkin
|
54712b8664
Oh, forgot to commit
|
3 mēneši atpakaļ |
Piotr Wilkin
|
17240eafc0
Order stuff around
|
3 mēneši atpakaļ |
Piotr Wilkin
|
1579bcb202
What am I missing? :/
|
3 mēneši atpakaļ |
Piotr Wilkin
|
0a9244acd0
The optimization worked even too well ;)
|
3 mēneši atpakaļ |
Piotr Wilkin
|
8ddaf251ae
Fix some state regressions... still wip
|
3 mēneši atpakaļ |
Piotr Wilkin
|
6942c85cf8
Oh, actually set n_tasks as well :P
|
3 mēneši atpakaļ |
Piotr Wilkin
|
477c1616ad
Parallelize delta_net
|
3 mēneši atpakaļ |
Piotr Wilkin
|
4ef6f337de
Proper multi-sequence convolution calculation, corrected (?) state management
|
3 mēneši atpakaļ |
Piotr Wilkin
|
5f5e30007c
Dilution n_seqs -> 1
|
3 mēneši atpakaļ |
Piotr Wilkin
|
eb0a15fc9b
n_tokens -> n_seq_tokens
|
3 mēneši atpakaļ |
Piotr Wilkin
|
ee52fe36f3
Modify sanity check to handle hybrid models
|
3 mēneši atpakaļ |
Piotr Wilkin
|
0dd6110fdc
v1.0
|
3 mēneši atpakaļ |
Piotr Wilkin
|
adcbd9428f
Linear layer output convergence
|
3 mēneši atpakaļ |