Piotr Wilkin
|
dbd4d97cf2
Fix cb calls
|
hai 4 meses |
Piotr Wilkin
|
32dcee47ef
Some attempts to get the convolution input right.
|
hai 4 meses |
Piotr Wilkin
|
397cd9fd67
Fix whitespace / end-of-line newline issues.
|
hai 4 meses |
Piotr Wilkin
|
5a8463f4a6
Add missing LFM2 code
|
hai 4 meses |
Piotr Wilkin
|
64de434118
Fixes from main branch
|
hai 4 meses |
Piotr Wilkin
|
7bedf4c66c
Refactor llama-model.cpp
|
hai 4 meses |
Piotr Wilkin
|
9014feadfa
Change RoPE to NeoX
|
hai 4 meses |
Piotr Wilkin
|
f020baa466
Normal attention: apply gate before output
|
hai 4 meses |
Piotr Wilkin
|
27fa5f335d
Correct convolution state dimension calculations
|
hai 4 meses |
Piotr Wilkin
|
e24c9dfa60
Remove OP_DELTA_NET, fix flake8 and editorchecker because why not
|
hai 4 meses |
Piotr Wilkin
|
6e3abeb6c0
Exclude MTP layers in conversion
|
hai 4 meses |
Piotr Wilkin
|
43eb7a7757
Now that eval's running move delta net stuff back to llama-model, add cbs
|
hai 4 meses |
Piotr Wilkin
|
890fa2c1e3
WE HAVE OUTPUT!
|
hai 4 meses |
Piotr Wilkin
|
e590a75905
Cleanup complete, now for the recurrent memory management...
|
hai 4 meses |
Piotr Wilkin
|
2b0673c315
Cleanup ggml_delta_net
|
hai 4 meses |
Piotr Wilkin (ilintar)
|
72c98b0c7d
Merge pull request #1 from ggml-org/xsn/qwen3next_experiment
|
hai 4 meses |
Xuan Son Nguyen
|
e83ef74733
one less magic number
|
hai 4 meses |
Xuan Son Nguyen
|
f643b957f4
refactor softplus fn
|
hai 4 meses |
Xuan Son Nguyen
|
46110e0630
split q_proj/gate
|
hai 4 meses |
Piotr Wilkin
|
9832f2934a
Remove comments as half of them are wrong anyways
|
hai 4 meses |
Piotr Wilkin
|
8152df60f3
Getting closer (graph builds for bs=1 but tensor shaping is still wrong for bigger sizes)
|
hai 4 meses |
Piotr Wilkin
|
e0c5dff2a7
Rewrite to tensor ops
|
hai 4 meses |
Piotr Wilkin
|
178230ee21
Getting to decode stage...
|
hai 4 meses |
Piotr Wilkin (ilintar)
|
c78f9fce68
Merge branch 'ggml-org:master' into qwen3_next
|
hai 4 meses |
Radoslav Gerganov
|
2b6b55a59f
server : include usage statistics only when user request them (#16052)
|
hai 4 meses |
Georgi Gerganov
|
e58174cecb
llama : bump max seq limit from 64 to 256 (#15916)
|
hai 4 meses |
Georgi Gerganov
|
b213fce89b
metal : improve F32, F16 and BF16 mat-vec multiplication (#16057)
|
hai 4 meses |
Jhen-Jie Hong
|
e00f3fd8ff
metal : avoid call free for non-owned buffer (#16067)
|
hai 4 meses |
Georgi Gerganov
|
f2f28380ea
metal : handle nil cv during pipeline creation (#16065)
|
hai 4 meses |
Chenguang Li
|
62c3b645c5
CANN: Remove print (#16044)
|
hai 4 meses |