Commit History

Autor SHA1 Mensaxe Data
  Neo Zhang Jianyu 95bc82fbc0 [SYCL] add missed dll file in package (#9577) hai 1 ano
  R0CKSTAR 7691654c68 mtgpu: enable VMM (#9597) hai 1 ano
  Xuan Son Nguyen ea9c32be71 ci : fix docker build number and tag name (#9638) hai 1 ano
  Charles Xu 1e43630218 ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217) hai 1 ano
  Xuan Son Nguyen afbbfaa537 server : add more env vars, improve gen-docs (#9635) hai 1 ano
  Gabe Goodhart 3d6bf6919f llama : add IBM Granite MoE architecture (#9438) hai 1 ano
  Dou Xinpeng 904837e0cb cann: fix crash when llama-bench is running on multiple cann devices (#9627) hai 1 ano
  Eric Zhang 70392f1f81 ggml : add AVX512DQ requirement for AVX512 builds (#9622) hai 1 ano
  Georgi Gerganov bb5f819975 sync : ggml hai 1 ano
  Georgi Gerganov c038931615 examples : adapt to ggml.h changes (ggml/0) hai 1 ano
  Georgi Gerganov 31ac5834fe llama : keep track of all EOG tokens in the vocab (#9609) hai 1 ano
  Georgi Gerganov cea1486ecf log : add CONT level for continuing previous log entry (#9610) hai 1 ano
  StrangeBytesDev 0aa15011e3 server : add newline after chat example (#9616) hai 1 ano
  Georgi Gerganov b0f27361f3 sampling : avoid expensive softmax during greedy sampling (#9605) hai 1 ano
  Max Krasnyansky c087b6f11d threads: fix msvc build without openmp (#9615) hai 1 ano
  Ivan 116efee0ee cuda: add q8_0->f32 cpy operation (#9571) hai 1 ano
  Xuan Son Nguyen 0b3bf966f4 server : add --no-context-shift option (#9607) hai 1 ano
  Max Krasnyansky f0c7b5edf8 threads: improve ggml_barrier scaling with large number of threads (#9598) hai 1 ano
  Riceball LEE 1d48e98e4f readme : add programmable prompt engine language CLI (#9599) hai 1 ano
  Georgi Gerganov f3979df762 flake.lock: Update (#9586) hai 1 ano
  Srihari-mcw 1e7b9299c6 ggml : AVX512 gemm for Q4_0_8_8 (#9532) hai 1 ano
  Georgi Gerganov 37f8c7b4c9 perplexity : remove extra new lines after chunks (#9596) hai 1 ano
  Georgi Gerganov bf9c1013ac metal : use F32 prec for K*Q in vec FA (#9595) hai 1 ano
  Akarshan Biswas e62e9789cd Revert "[SYCL] fallback mmvq (#9088)" (#9579) hai 1 ano
  R0CKSTAR c35e586ea5 musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) hai 1 ano
  Molly Sophia 912c331d3d Fix merge error in #9454 (#9589) hai 1 ano
  Johannes Gäßler a5b57b08ce CUDA: enable Gemma FA for HIP/Pascal (#9581) hai 1 ano
  Shankar ecd5d6b65b llama: remove redundant loop when constructing ubatch (#9574) hai 1 ano
  Molly Sophia 2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454) hai 1 ano
  slaren d09770cae7 ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573) hai 1 ano