Commit History

Autor SHA1 Mensaxe Data
  Michael Podvitskiy 3202361c5b ggml, ci : Windows ARM runner and build fixes (#5979) hai 1 ano
  Georgi Gerganov ee35600b90 llama : fix F16/F32 downcast + improve names (#5980) hai 1 ano
  DAN™ bcebd7dbf6 llama : add support for GritLM (#5959) hai 1 ano
  slaren d894f352bf perplexity : support using multiple sequences to allow larger batch sizes (#5946) hai 1 ano
  Georgi Gerganov 5b09797321 ggml : remove old quantization functions (#5942) hai 1 ano
  compilade c2101a2e90 llama : support Mamba Selective State Space Models (#5328) hai 1 ano
  compilade 515f7d0d4f llama : fix quantization of shared token_embd (#5944) hai 1 ano
  Don Mahurin e457fb3540 llama : assume tied weights if lm_head/output weights is missing (#5824) hai 1 ano
  Neo Zhang Jianyu 89fb735fcf Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) hai 1 ano
  Georgi Gerganov 2002bc96bf server : refactor (#5882) hai 1 ano
  Neo Zhang Jianyu ceca1aef07 [SYCL] fix error when set main gpu to non-zero (#5901) hai 1 ano
  0cc4m 61d1c88e15 Vulkan Improvements (#5835) hai 1 ano
  Georgi Gerganov 29ae62d2ae llama : fix embeddings (#5796) hai 1 ano
  Xuan Son Nguyen 4ffcdce2ff add alias for chat template (#5858) hai 1 ano
  Douglas Hanley 475df1d6cf llama : allow for user specified embedding pooling type (#5849) hai 1 ano
  compilade de9692a7d2 llama : fix llama_copy_state_data with fragmented KV cache (#5840) hai 1 ano
  Michael Podvitskiy 4a6e2d6142 llama : add abort_callback to interrupt computation (#5409) hai 1 ano
  Xuan Son Nguyen 6c32d8c7ad llama : refactor internal quantization functions (#5830) hai 1 ano
  compilade 802da0091b llama : fix segfault from unknown model arch name (#5820) hai 1 ano
  Neo Zhang Jianyu 715641391d Support multiple GPUs (split mode) on SYCL backend (#5806) hai 1 ano
  Sourab Mangrulkar c29af7e225 llama : add StarCoder2 support (#5795) hai 1 ano
  Pierrick Hymbert 3ab8b3a92e llama : cleanup unused mmq flags (#5772) hai 1 ano
  Douglas Hanley 9600d59e01 unicode : switch to multimap based nfd_map (#5799) hai 1 ano
  Marcus Dunn d5ab29757e llama : constified `llama_set_state_data`'s `src` (#5774) hai 1 ano
  Georgi Gerganov 08c5ee87e4 llama : remove deprecated API (#5770) hai 1 ano
  compilade adcb12a9ba llama : fix non-quantization of expert gating tensors (#5754) hai 1 ano
  Douglas Hanley 177628bfd8 llama : improve BERT tokenization (#5740) hai 1 ano
  Kawrakow 0becb22ac0 IQ4_XS: a 4.25 bpw quantization (#5747) hai 1 ano
  Georgi Gerganov 9d533a77d0 llama : fix defrag bugs + add parameter (#5735) hai 1 ano
  Kawrakow a33e6a0d2a Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721) hai 1 ano