Commit Verlauf

Autor SHA1 Nachricht Datum
  Olivier Chafik f13847cfb5 server: fix regression on streamed non-chat completion w/ stops (#13785) vor 8 Monaten
  Georgi Gerganov 79c137f776 examples : allow extracting embeddings from decoder contexts (#13797) vor 8 Monaten
  Georgi Gerganov 22229314fc llama : clarify deprecation message (#13794) vor 8 Monaten
  Romain Biessy 9012eb9b45 sycl: Add more debug prints (#13640) vor 8 Monaten
  Jeff Bolz fef693dc6b vulkan: mark IM2COL as supporting non-contig (#13783) vor 8 Monaten
  Bizhao Shi 2d38b6e400 CANN: Add the basic supports of Flash Attention kernel (#13627) vor 8 Monaten
  Olivier Chafik e121edc432 `server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771) vor 8 Monaten
  Xuan-Son Nguyen 2f099b510f webui : bump max upload file size to 500MB (#13779) vor 8 Monaten
  Sigbjørn Skjæret aa50ba462f tests : improve UGM tokenizer test coverage (#13773) vor 8 Monaten
  Georgi Gerganov de2ef53a4b kv-cache : rework kv_cell (#13706) vor 8 Monaten
  Percy Piper c508256db2 rpc : Fix build on OpenBSD (#13541) vor 8 Monaten
  Xuan-Son Nguyen 40aaa8a403 mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760) vor 8 Monaten
  ddpasa a08c1d2845 docs : add Moondream2 pre-quantized link (#13745) vor 8 Monaten
  Olivier Chafik d785f9c1fd server: fix/test add_generation_prompt (#13770) vor 8 Monaten
  Piotr Jasiukajtis 4032ca4066 llama : add support for Qwen3 MoE tied word embeddings (#13768) vor 8 Monaten
  Akarshan Biswas 515fdbf7ed SYCL: revert "sycl: simplify bin_bcast_kernel (#13383)" (#13752) vor 8 Monaten
  Olivier Chafik f5cd27b71d `server`: streaming of tool calls and thoughts when `--jinja` is on (#12379) vor 8 Monaten
  Diego Devesa a2d02d5793 releases : bundle llvm omp library in windows release (#13763) vor 8 Monaten
  Diego Devesa 17fc817b58 releases : enable openmp in windows cpu backend build (#13756) vor 8 Monaten
  Diego Devesa 2bd1b30f69 ggml-cpu : set openmp wait time if not set (#13758) vor 8 Monaten
  0cc4m 259469c4b5 Move GLM4 f32 attention fix to the correct function (#13750) vor 8 Monaten
  Xuan-Son Nguyen 4c32832c59 ggml : add ggml_gelu_erf() CUDA kernel (#13719) vor 8 Monaten
  Sigbjørn Skjæret c3a2624339 vocab : fix ugm tokenizer precision (#13743) vor 8 Monaten
  Johannes Gäßler ffd0eae60b CUDA: fix race condition in FA vector kernels (#13742) vor 8 Monaten
  Diego Devesa b775345d78 ci : enable winget package updates (#13734) vor 8 Monaten
  Diego Devesa a70a8a69c2 ci : add winget package updater (#13732) vor 8 Monaten
  Georgi Gerganov d13d0f6135 hparams : initialize arrays (#13728) vor 8 Monaten
  Xuan-Son Nguyen 8a2afb7520 llama : allow custom list of swa_layers (#13726) vor 8 Monaten
  Xuan-Son Nguyen 9ecf3e66a3 server : support audio input (#13714) vor 8 Monaten
  Chenguang Li faaaff5f94 CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705) vor 8 Monaten