cturan/llama.cpp

Author	SHA1 Message	Date
Georgi Gerganov	d774ab3acc metal : adjust support conditions for norm operators (#11671)	11 months ago
Johannes Gäßler	fa62da9b2d CUDA: support for mat. mul. with ne03 != ne13 (#11656)	11 months ago
SAMI	1ec208083c llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)	11 months ago
Olivier Chafik	9f4cc8f8d3 `sync`: minja (#11641)	11 months ago
Johannes Gäßler	fd08255d0d CUDA: non-contiguous (RMS) norm support (#11659)	11 months ago
fxzjshm	3ec9fd4b77 HIP: force max threads per block to be 1024 (#11621)	11 months ago
Xuan-Son Nguyen	3962fc1a79 server : add try..catch to places not covered by set_exception_handler (#11620)	11 months ago
Radoslav Gerganov	1bef571f6a arg : list RPC devices first when using --list-devices (#11655)	11 months ago
Olivier Chafik	db288b60cb `tool-call`: command r7b fix for normal responses (#11608)	11 months ago
Shelby Jenkins	106045e7bb readme : add llm_client Rust crate to readme bindings (#11628)	11 months ago
Jhen-Jie Hong	f117d84b48 swift : fix llama-vocab api usage (#11645)	11 months ago
Jhen-Jie Hong	534c46b53c metal : use residency set for other platforms (#11648)	11 months ago
Georgi Gerganov	387a1598ca authors : update	11 months ago
Georgi Gerganov	7c9e0ca520 sync : ggml	11 months ago
Christian Kastner	8f8290ada9 cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)	11 months ago
Georgi Gerganov	b34aedd558 ci : do not stale-close roadmap issues	11 months ago
Olivier Chafik	cde3833239 `tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616)	11 months ago
Xuan-Son Nguyen	b3451785ac server : (webui) revert hacky solution from #11626 (#11634)	11 months ago
Woof Dog	1d1e6a90bc server : (webui) allow typing and submitting during llm response (#11626)	11 months ago
Daniel Bevenius	5598f475be server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622)	11 months ago
Georgi Gerganov	8ec05832fa sync : ggml	11 months ago
Johannes Gäßler	21c84b5d2d CUDA: fix Volta FlashAttention logic (#11615)	11 months ago
mashdragon	d92cb67e37 server : (webui) Fix Shift+Enter handling (#11609)	11 months ago
Johannes Gäßler	6eecde3cc8 HIP: fix flash_attn_stream_k_fixup warning (#11604)	11 months ago
uvos	396856b400 CUDA/HIP: add support for selectable warp size to mmv (#11519)	11 months ago
uvos	4d0598e144 HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601)	11 months ago
Olivier Chafik	90f9b88afb nit: more informative crash when grammar sampler fails (#11593)	11 months ago
Johannes Gäßler	864a0b67a6 CUDA: use mma PTX instructions for FlashAttention (#11583)	11 months ago
Eric Curtin	84ec8a58f7 Name colors (#11573)	11 months ago
Olivier Chafik	bfcce4d693 `tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585)	11 months ago

Newer Older

Commit History Find

Commit History