Georgi Gerganov
|
745aa5319b
llama : deprecate llama_kv_self_ API (#14030)
|
7 месяцев назад |
Georgi Gerganov
|
487a5e0401
context : fix SWA-related warning for multiple sequences (#14045)
|
7 месяцев назад |
Sigbjørn Skjæret
|
d17a809ef0
llama : support multiple classifier outputs and labels (#13940)
|
7 месяцев назад |
Sigbjørn Skjæret
|
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer (#14031)
|
7 месяцев назад |
Masato Nakasaka
|
669c13e0f6
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)
|
7 месяцев назад |
pockers21
|
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines (#14005)
|
7 месяцев назад |
Georgi Gerganov
|
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory (#14006)
|
7 месяцев назад |
Diego Devesa
|
3a077146a4
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013)
|
7 месяцев назад |
Olexandr88
|
d01d112abb
readme : add badge (#13938)
|
7 месяцев назад |
Sigbjørn Skjæret
|
9f47fa5792
vocab : warn about missing mask token (#14022)
|
7 месяцев назад |
Georgi Gerganov
|
9e31bec4fd
context : fix pos_min initialization upon error decode (#14008)
|
7 месяцев назад |
Jeff Bolz
|
5a8ae3053c
vulkan: automatically deduce size of push constants (#13936)
|
7 месяцев назад |
Ervin Áron Tasnádi
|
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
|
7 месяцев назад |
Georgi Gerganov
|
3e63a58ef7
kv-cache : refactor the update/defrag mechanism (#13988)
|
7 месяцев назад |
Diego Devesa
|
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)
|
7 месяцев назад |
Diego Devesa
|
482548716f
releases : use dl backend for linux release, remove arm64 linux release (#13996)
|
7 месяцев назад |
Xuan-Son Nguyen
|
3ac67535c8
llama-graph : use ggml_repeat_4d (#13998)
|
7 месяцев назад |
Johannes Gäßler
|
0b4be4c435
CUDA: fix FTZ in FA for Gemma 3 (#13991)
|
7 месяцев назад |
Georgi Gerganov
|
e0e806f52e
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)
|
7 месяцев назад |
Jeff Bolz
|
7e00e60ef8
vulkan: fix warnings in perf logger querypool code (#13937)
|
7 месяцев назад |
Xuan-Son Nguyen
|
ea1431b0fa
docs : add "Quick start" section for new users (#13862)
|
7 месяцев назад |
lhez
|
71e74a3ac9
opencl: add `backend_synchronize` (#13939)
|
7 месяцев назад |
rmatif
|
bfb1e012a0
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)
|
7 месяцев назад |
Georgi Gerganov
|
3637576288
server : disable speculative decoding for SWA models (#13970)
|
7 месяцев назад |
Georgi Gerganov
|
ea394d7ab1
metal : use F32 accumulators in FA kernels (#13975)
|
7 месяцев назад |
Georgi Gerganov
|
5582c49c39
gemma : more consistent attention scaling for v2 and v3 (#13951)
|
7 месяцев назад |
Olivier Chafik
|
c9bbc77931
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
|
7 месяцев назад |
Xuan-Son Nguyen
|
bfd322796c
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)
|
7 месяцев назад |
shalinib-ibm
|
093e3f1feb
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966)
|
7 месяцев назад |
Atharva Dubey
|
663445b0de
sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826)
|
7 месяцев назад |