Georgi Gerganov
|
692e3cdd0a
memory : rename interface to llama_memory_context_i (#14296)
|
пре 7 месеци |
Georgi Gerganov
|
812939a9e9
model : more uniform output id handling (#14275)
|
пре 7 месеци |
Georgi Gerganov
|
4c9fdfbe15
ubatch : new splitting logic (#14217)
|
пре 7 месеци |
Gabe Goodhart
|
edc4a29eff
memory : Hybrid recurrent cache (#13979)
|
пре 7 месеци |
Georgi Gerganov
|
60c666347b
batch : rework llama_batch_allocr (#14153)
|
пре 7 месеци |
Đinh Trọng Huy
|
d714dadb57
pooling : make cls_b and cls_out_b optional (#14165)
|
пре 7 месеци |
compilade
|
dad5c44398
kv-cache : avoid modifying recurrent cells when setting inputs (#13834)
|
пре 7 месеци |
Sigbjørn Skjæret
|
3678b838bb
llama : support GEGLU for jina-bert-v2 (#14090)
|
пре 7 месеци |
Georgi Gerganov
|
201b31dc2e
graph : fix geglu (#14077)
|
пре 7 месеци |
Đinh Trọng Huy
|
91a8ee6a6f
add geglu activation function (#14074)
|
пре 7 месеци |
Xuan-Son Nguyen
|
3ac67535c8
llama-graph : use ggml_repeat_4d (#13998)
|
пре 7 месеци |
Georgi Gerganov
|
0fc16b42e8
kv-cache : split implementation in separate sources (#13920)
|
пре 7 месеци |
Georgi Gerganov
|
12d0188c0d
kv-cache : refactor + add llama_memory_state_i (#13746)
|
пре 7 месеци |
Xuan-Son Nguyen
|
763d06edb7
llama : fix KV shift for qwen2vl (#13870)
|
пре 7 месеци |
Đinh Trọng Huy
|
e0e3aa231d
llama : add support for BertForSequenceClassification reranker (#13858)
|
пре 7 месеци |
0cc4m
|
259469c4b5
Move GLM4 f32 attention fix to the correct function (#13750)
|
пре 7 месеци |
Georgi Gerganov
|
b44890df2e
model : disable SWA for Phi models (#13676)
|
пре 8 месеци |
0cc4m
|
c9c64dee57
Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output (#13639)
|
пре 8 месеци |
Georgi Gerganov
|
e298d2fbd0
kv-cache : add SWA support (#13194)
|
пре 8 месеци |
Johannes Gäßler
|
10d2af0eaa
llama/ggml: add LLM training support (#10544)
|
пре 8 месеци |
Johannes Gäßler
|
0cf6725e9f
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
пре 8 месеци |
Xuan-Son Nguyen
|
2f54e348ad
llama : fix build_ffn without gate (#13336)
|
пре 8 месеци |
Georgi Gerganov
|
c642bc014c
kv-cache : separate recurrent vs non-recurrent impl (#12799)
|
пре 8 месеци |
Xuan-Son Nguyen
|
b6ce7430b7
llama-graph : fix text position for mrope (#13159)
|
пре 8 месеци |
AT
|
5f5e39e1ba
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
|
пре 8 месеци |
Xuan-Son Nguyen
|
d2b2031e5f
llama : (mrope) allow using normal 1D position for text token (#13138)
|
пре 8 месеци |
City
|
558a764713
Force FP32 compute in GLM4 FFN Down (#13101)
|
пре 8 месеци |
Georgi Gerganov
|
2f74c354c0
graph : make FA compatible with MLA + add initial Metal kernels (#12953)
|
пре 9 месеци |
Juk Armstrong
|
daa422881a
llama : DeepSeek V2/V3 MLA implementation (#12801)
|
пре 9 месеци |
Georgi Gerganov
|
a19b5cef16
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
|
пре 9 месеци |