Commit History

Author SHA1 Message Date
  xiaofei a0f7016d17 rpc : fix cache directory initialization (#13188) 8 months ago
  Johannes Gäßler 19e899ce21 scripts: n_depth for compare-llama-bench [no ci] (#13201) 8 months ago
  matteo e2e1ddb93a server : Prefilling assistant message in openai compatible API (#13174) 8 months ago
  Georgi Gerganov d9d398f84f sampling : when top-k <= 0 -> noop (#13173) 8 months ago
  Alberto Cabrera Pérez 5a63980117 llama-bench: fixed size of fields to correctly map to values (#13183) 8 months ago
  Johannes Gäßler cdf76586b2 CUDA: fix non-cont. inputs for batched mat mul (#13155) 8 months ago
  Sigbjørn Skjæret 7d3af70b08 llama : llm_type order by size (#13177) 8 months ago
  Xuan-Son Nguyen 00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141) 8 months ago
  Sigbjørn Skjæret e98b3692be llama : set qwen3 model type sizes (#13175) 8 months ago
  Xuan-Son Nguyen b6ce7430b7 llama-graph : fix text position for mrope (#13159) 8 months ago
  AT 5f5e39e1ba model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466) 8 months ago
  Xuan-Son Nguyen eaea325324 clip : fix model size display (#13153) 8 months ago
  Ville Vesilehto 43ddab6eee fix(rpc): Improve input validation and error handling (#13069) 8 months ago
  Vishal Agarwal 1831f538f7 llama-bench: add `-d` depth arg (#13096) 8 months ago
  Xuan-Son Nguyen 4e87962e34 mtmd : fix glm-edge redundant token count (#13139) 8 months ago
  pockers21 fb0471d175 context : do not clear output buffer on reserve (#13152) 8 months ago
  Xuan-Son Nguyen d2b2031e5f llama : (mrope) allow using normal 1D position for text token (#13138) 8 months ago
  Xuan-Son Nguyen 5fa9e63be8 clip : refactor set input for cgraph + fix qwen2.5vl input (#13136) 8 months ago
  Akarshan Biswas a4c340f974 SYCL: Add all missing unary kernels (#13074) 8 months ago
  Georgi Gerganov d0a417f3c7 readme : update hot topics (#13150) 8 months ago
  Georgi Gerganov 43f2b07193 common : fix noreturn compile warning (#13151) 8 months ago
  Xuan-Son Nguyen e5d6c2554e llama-chat : fix typo GML --> GLM (#13143) 8 months ago
  R0CKSTAR f0dd6a1926 musa: fix typo in cc control (#13144) 8 months ago
  Johannes Gäßler 69699be48a CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137) 8 months ago
  Xuan-Son Nguyen 85f36e5e71 arg : fix unused variable (#13142) 8 months ago
  4onen c0a97b762e llama-bench : Add `--override-tensors` arg (#12922) 8 months ago
  matteo ced44be342 llama-chat : fix wrong template in GLM4-0414 (#13140) 8 months ago
  R0CKSTAR e291450b76 musa: fix build warning (#13129) 8 months ago
  LostRuins Concedo 59e991c23c Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133) 8 months ago
  HimariO ca2bb89eac clip : Add Qwen2.5VL support (#12402) 8 months ago