Commit History

Author SHA1 Message Date
  ddh0 f6dcda3900 server : context checkpointing for hybrid and recurrent models (#16382) 3 months ago
  Johannes Gäßler e789095502 llama: print memory breakdown on exit (#15860) 3 months ago
  Georgi Gerganov b730706a49 kv-cache : support layer reuse (#15504) 4 months ago
  Georgi Gerganov d32e03f449 server : add SWA checkpoints (#15293) 5 months ago
  l3utterfly 7233358d29 memory : handle saving/loading null layers in recurrent memory (#14675) 6 months ago
  Georgi Gerganov 01612b7409 llama : reuse compute graphs (#14482) 6 months ago
  compilade 4a5686da22 llama : support Jamba hybrid Transformer-Mamba models (#7531) 6 months ago
  compilade bb4f7a9e4e memory : fix broken batch splits for recurrent cache (#14575) 6 months ago
  Georgi Gerganov 67d1ef23c6 batch : add optional for sequential equal split (#14511) 6 months ago
  Georgi Gerganov c79184d2d1 batch : add n_used count (#14512) 6 months ago
  Georgi Gerganov 745f11fed0 memory : correctly handle failure in apply() (#14438) 6 months ago
  Georgi Gerganov 43678060c1 recurrent : call balloc split_reset() in init_batch() (#14414) 6 months ago
  Georgi Gerganov 692e3cdd0a memory : rename interface to llama_memory_context_i (#14296) 7 months ago
  Georgi Gerganov 4c9fdfbe15 ubatch : new splitting logic (#14217) 7 months ago
  Gabe Goodhart edc4a29eff memory : Hybrid recurrent cache (#13979) 7 months ago