Historique des commits

Auteur SHA1 Message Date
  Diego Devesa 22cdab343b llama-bench : accept ranges for integer parameters (#13410) il y a 8 mois
  Dan Johansson a71a4075cd ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053) il y a 8 mois
  Johannes Gäßler 95e18884fc CUDA: fix misaligned synchronization in FA (#13469) il y a 8 mois
  Xuan-Son Nguyen df8491922f ggml : add mrope kernel for metal (#13457) il y a 8 mois
  Atharva Dubey 14492144c2 enable dpcpp nightly builds with libraries (#13406) il y a 8 mois
  City c104023994 mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) il y a 8 mois
  Anthony Umfer 9a390c4829 tools : fix uninitialized llama_batch in server (#13436) il y a 8 mois
  Sigbjørn Skjæret 09232370fc scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451) il y a 8 mois
  Johannes Gäßler 7474e00b34 CUDA: fix crash with partial offloading of MoE (#13439) il y a 8 mois
  David Huang 7f323a589f Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386) il y a 8 mois
  City 3eac209319 mtmd : support InternVL 3 38B and 78B mmproj (#13443) il y a 8 mois
  Xuan-Son Nguyen a634d75d1b mtmd : move helpers to dedicated file (#13442) il y a 8 mois
  Thomas Germer 62d4250e52 docs : Fix typo in InternVL3 model name (#13440) il y a 8 mois
  Johannes Gäßler 0208355f42 CUDA: fix race conditions FlashAttention kernels (#13438) il y a 8 mois
  Sigbjørn Skjæret d2a4ef05c6 vocab : add ByteDance-Seed/Seed-Coder (#13423) il y a 8 mois
  Xuan-Son Nguyen 15e6125a39 mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434) il y a 8 mois
  Xuan-Son Nguyen 3b24d26c22 server : update docs (#13432) il y a 8 mois
  Sigbjørn Skjæret 43dfd741a5 llguidance : set tokenizer slices to default (#13424) il y a 8 mois
  Thammachart Chinvarapon b064a51a4e ci: free_disk_space flag enabled for intel variant (#13426) il y a 8 mois
  Xuan-Son Nguyen 053367d149 mtmd : support InternVL 2.5 and 3 (#13422) il y a 8 mois
  Johannes Gäßler d8919424f1 CUDA: fix FlashAttention on Turing (#13415) il y a 8 mois
  Xuan-Son Nguyen 7fef11766c arg : add env var to control mmproj (#13416) il y a 8 mois
  Jeff Bolz dc1d2adfc0 vulkan: scalar flash attention implementation (#13324) il y a 8 mois
  Helton Reis 7c28a74e07 chore(llguidance): use tagged version that does not break the build (#13413) il y a 8 mois
  Xuan-Son Nguyen 33eff40240 server : vision support via libmtmd (#12898) il y a 8 mois
  Alberto Cabrera Pérez 17512a94d6 sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) il y a 8 mois
  Georgi Gerganov 611aa914ef metal : optimize MoE for large batches (#13388) il y a 8 mois
  Johannes Gäßler 0cf6725e9f CUDA: FA support for Deepseek (Ampere or newer) (#13306) il y a 8 mois
  Diego Devesa 27ebfcacba llama : do not crash if there is no CPU backend (#13395) il y a 8 mois
  Johannes Gäßler 5c86c9ed3e CUDA: fix crash on large batch size for MoE models (#13384) il y a 8 mois