Commit Verlauf

Autor SHA1 Nachricht Datum
  吴小白 5787b5da57 ci: add LoongArch cross-compile build (#13944) vor 7 Monaten
  Akarshan Biswas 228f34c9ce SYCL: Implement few same quantized type copy kernels (#13739) vor 7 Monaten
  Sigbjørn Skjæret 0974ad7a7c llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050) vor 7 Monaten
  Georgi Gerganov 745aa5319b llama : deprecate llama_kv_self_ API (#14030) vor 7 Monaten
  Georgi Gerganov 487a5e0401 context : fix SWA-related warning for multiple sequences (#14045) vor 7 Monaten
  Sigbjørn Skjæret d17a809ef0 llama : support multiple classifier outputs and labels (#13940) vor 7 Monaten
  Sigbjørn Skjæret 1caae7fc6c gguf-py : add add_classifier_output_labels method to writer (#14031) vor 7 Monaten
  Masato Nakasaka 669c13e0f6 vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001) vor 7 Monaten
  pockers21 146b88e8b3 ci: fix CUDA build failure on autodl cloud machines (#14005) vor 7 Monaten
  Georgi Gerganov 7f37b6cf1e memory : migrate from llama_kv_cache to more generic llama_memory (#14006) vor 7 Monaten
  Diego Devesa 3a077146a4 llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) vor 7 Monaten
  Olexandr88 d01d112abb readme : add badge (#13938) vor 7 Monaten
  Sigbjørn Skjæret 9f47fa5792 vocab : warn about missing mask token (#14022) vor 7 Monaten
  Georgi Gerganov 9e31bec4fd context : fix pos_min initialization upon error decode (#14008) vor 7 Monaten
  Jeff Bolz 5a8ae3053c vulkan: automatically deduce size of push constants (#13936) vor 7 Monaten
  Ervin Áron Tasnádi 0d3984424f ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813) vor 7 Monaten
  Georgi Gerganov 3e63a58ef7 kv-cache : refactor the update/defrag mechanism (#13988) vor 7 Monaten
  Diego Devesa 2589ad3704 ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) vor 7 Monaten
  Diego Devesa 482548716f releases : use dl backend for linux release, remove arm64 linux release (#13996) vor 7 Monaten
  Xuan-Son Nguyen 3ac67535c8 llama-graph : use ggml_repeat_4d (#13998) vor 7 Monaten
  Johannes Gäßler 0b4be4c435 CUDA: fix FTZ in FA for Gemma 3 (#13991) vor 7 Monaten
  Georgi Gerganov e0e806f52e kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985) vor 7 Monaten
  Jeff Bolz 7e00e60ef8 vulkan: fix warnings in perf logger querypool code (#13937) vor 7 Monaten
  Xuan-Son Nguyen ea1431b0fa docs : add "Quick start" section for new users (#13862) vor 7 Monaten
  lhez 71e74a3ac9 opencl: add `backend_synchronize` (#13939) vor 7 Monaten
  rmatif bfb1e012a0 OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840) vor 7 Monaten
  Georgi Gerganov 3637576288 server : disable speculative decoding for SWA models (#13970) vor 7 Monaten
  Georgi Gerganov ea394d7ab1 metal : use F32 accumulators in FA kernels (#13975) vor 7 Monaten
  Georgi Gerganov 5582c49c39 gemma : more consistent attention scaling for v2 and v3 (#13951) vor 7 Monaten
  Olivier Chafik c9bbc77931 `server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933) vor 7 Monaten