Commit History

Author SHA1 Message Date
  Chenguang Li 1e7489745a CANN: refactor mask handling and improve performance in FA (#15561) 5 months ago
  xctan 1cf123a343 ggml-cpu : add basic RVV support for vector f32 ops (#15057) 5 months ago
  Daniel Bevenius fcca2182a1 common : add -m to bash completion for --model [no ci] (#15591) 5 months ago
  rmatif 86076f92de OpenCL: add fused group_norm/norm, mul, add (#15314) 5 months ago
  Diego Devesa bcbddcd54f tests : fix test-opt with GGML_BACKEND_DL (#15599) 5 months ago
  Akarshan Biswas 8b69686136 SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (#15592) 5 months ago
  fidoriel 8ce3ff1d91 mtmd : fix mtmd ios build (#15579) 5 months ago
  Eve 44b1efa41a tests: add performance test for mul mat id (#15543) 5 months ago
  shalinib-ibm a6a58d6478 llamafile: PowerPC Sgemm Optimization (#15558) 5 months ago
  Georgi Gerganov 0373486dbc graph : fix assert in memory-less build_attn (#15590) 5 months ago
  Daniel Bevenius 62cef26ac5 model-conversion : add qat-q4 quantization targets (#15588) 5 months ago
  Johannes Gäßler 8f5afa94c4 CUDA: return -1 for nonexistent compiled arch (#15587) 5 months ago
  Georgi Gerganov b3964c1e89 metal : optimize FA vec for large sequences and BS <= 8 (#15566) 5 months ago
  Xuan-Son Nguyen 79a546220c mtmd : support Kimi VL model (#15458) 5 months ago
  Georgi Gerganov 85cc1ae998 context : print graph stats for memory-less contexts (#15586) 5 months ago
  Georgi Gerganov 1d8d83deaa metal : improve `MUL_MAT_ID` (#15541) 5 months ago
  tc-mb c4e9239064 model : support MiniCPM-V 4.5 (#15575) 5 months ago
  Sigbjørn Skjæret 39842a7f73 gguf-py : remove erroneous FFN_GATE entry (#15583) 5 months ago
  Sigbjørn Skjæret 0fd90db585 metal : remove contiguous assertion for src0 in IM2COL (#15577) 5 months ago
  Yoshi_likes_e4 4c37636b3e Add a warning for special devices (#15563) 5 months ago
  Jeff Bolz 34bdbbd7c2 vulkan: Remove splitting for mul_mat_id (#15568) 5 months ago
  Qeeweew 74f52f77f2 CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451) 5 months ago
  lhez f7207b0415 opencl: fix support ops condition for `rms_norm` (#15560) 5 months ago
  Ruben Ortlam 4d917cd4f6 vulkan: fix min subgroup 16 condition for mmid subgroup optimization (#15565) 5 months ago
  Jeff Bolz 886b97a5d6 tests: Generate unique input values for count_equal (#15487) 5 months ago
  Ihar Hrachyshka 111f8d06f0 metal: fix regression when no metal devices are present (#15531) 5 months ago
  Johannes Gäßler 5eff6ec9b1 CUDA: MoE helper in device code, better tile sizes (#15525) 5 months ago
  Daniel Bevenius dfd9b5f6c7 model-conversion : set pooling type to none in logits.cpp (#15564) 5 months ago
  Daniel Bevenius 5a6bc6b1a6 model-conversion : add model card template for embeddings [no ci] (#15557) 5 months ago
  Georgi Gerganov 6b64f74b55 batched-bench : fix unified KV cache handling + pp timing (#15562) 5 months ago