Commit History

Author SHA1 Message Date
  Xuan-Son Nguyen 3c3635d2f2 server : speed up tests (#15836) 4 months ago
  Xuan-Son Nguyen 61bdfd5298 server : implement prompt processing progress report in stream mode (#15827) 4 months ago
  Johannes Gäßler 01806e7771 ggml-cpu: document use of "free" memory [no ci] (#15834) 4 months ago
  Aaron Teo 186415d595 ggml-cpu: drop support for nnpa intrinsics (#15821) 4 months ago
  Gabe Goodhart fd621880f3 aLoRA Support (#15327) 4 months ago
  Sigbjørn Skjæret 4281c7b315 ci : exempt correct research label (#15825) 4 months ago
  Gabe Goodhart 5fac79cbc7 Thinking model disabled assistant prefill (#15404) 4 months ago
  Eric Curtin 408ff524b4 Implement --log-colors with always/never/auto (#15792) 4 months ago
  Johannes Gäßler 5143fa895e CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (#15802) 4 months ago
  Daniel Bevenius 3a550b5ca4 tests : add --list-ops and --show-coverage options (#15745) 4 months ago
  Erik Scholz a81283820a gguf: gguf_writer refactor (#15691) 4 months ago
  Georgi Gerganov c610b6c11b kv-cache : fix SWA checks + disable cacheless iSWA (#15811) 4 months ago
  Daniel Bevenius 5d6688de08 model-conversion : add --embeddings flag to modelcard.template [no ci] (#15801) 4 months ago
  ExtReMLapin 4fd1242bef chat : fixed crash when Hermes 2 <tool_call> had a newline before it (#15639) 4 months ago
  Piotr Wilkin (ilintar) b2426e469e chat : nemotron thinking & toolcalling support (#15676) 4 months ago
  Piotr Wilkin (ilintar) 9e2b1e83c6 scripts : add Jinja tester PySide6 simple app (#15756) 4 months ago
  Daniel Bevenius fb15d649ed llama : add support for EmbeddingGemma 300m (#15798) 4 months ago
  Gabe Goodhart 856ed0947f metal : Add template specialization for mul_mm_id w/ ne20 == 10 (#15799) 4 months ago
  Daniel Bevenius d1e2adba65 llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (#15791) 4 months ago
  Chenguang Li c1c354e44c CANN: Refactor ND to NZ workspace to be per-device (#15763) 4 months ago
  Xuan-Son Nguyen a68d914426 server: add exceed_context_size_error type (#15780) 4 months ago
  Eric Curtin badb80cadb Document the new max GPU layers default in help (#15771) 4 months ago
  leejet 0a1b3982cd ggml: add ops for WAN video model (cuda && cpu) (#15669) 4 months ago
  hipudding 5421f63ab0 CANN: Fix precision issue on 310I DUO multi-devices (#15784) 4 months ago
  rmatif 820bc98531 opencl: add hs=40 to FA (#15758) 4 months ago
  Chenguang Li 239b60e898 CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (#15760) 4 months ago
  Ruben Ortlam dff7551bfd vulkan: fix mmv subgroup16 selection (#15775) 4 months ago
  Jeff Bolz 0fce7a1248 vulkan: don't use std::string in load_shaders, to improve compile time (#15724) 4 months ago
  Daniel Bevenius 8227695d7a vulkan : update ggml_vk_instance_validation_ext_available (#15666) 4 months ago
  Shin-myoung-serp 0014fb4add ggml vulkan: add hardsigmoid and hardswish operations (#15762) 4 months ago