junchao-zhao
|
8d4d2be143
ggml : fix LoongArch compile error with 128-bit SIMD (#11701)
|
11 месяцев назад |
Jeff Bolz
|
2c6c8df56d
vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521)
|
11 месяцев назад |
Rémy O
|
8a7e3bf17a
vulkan: initial support for IQ4_XS quantization (#11501)
|
11 месяцев назад |
Jeff Bolz
|
1b598b3058
vulkan: use smaller combined allocations to avoid fragmentation (#11551)
|
11 месяцев назад |
Charles Duffy
|
902368a06b
metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690)
|
11 месяцев назад |
Matvey Soloviev
|
c3db0480bb
readme : add link to Autopen under UIs (#11684)
|
11 месяцев назад |
Georgi Gerganov
|
d774ab3acc
metal : adjust support conditions for norm operators (#11671)
|
11 месяцев назад |
Johannes Gäßler
|
fa62da9b2d
CUDA: support for mat. mul. with ne03 != ne13 (#11656)
|
11 месяцев назад |
SAMI
|
1ec208083c
llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)
|
11 месяцев назад |
Olivier Chafik
|
9f4cc8f8d3
`sync`: minja (#11641)
|
11 месяцев назад |
Johannes Gäßler
|
fd08255d0d
CUDA: non-contiguous (RMS) norm support (#11659)
|
11 месяцев назад |
fxzjshm
|
3ec9fd4b77
HIP: force max threads per block to be 1024 (#11621)
|
11 месяцев назад |
Xuan-Son Nguyen
|
3962fc1a79
server : add try..catch to places not covered by set_exception_handler (#11620)
|
11 месяцев назад |
Radoslav Gerganov
|
1bef571f6a
arg : list RPC devices first when using --list-devices (#11655)
|
11 месяцев назад |
Olivier Chafik
|
db288b60cb
`tool-call`: command r7b fix for normal responses (#11608)
|
11 месяцев назад |
Shelby Jenkins
|
106045e7bb
readme : add llm_client Rust crate to readme bindings (#11628)
|
11 месяцев назад |
Jhen-Jie Hong
|
f117d84b48
swift : fix llama-vocab api usage (#11645)
|
11 месяцев назад |
Jhen-Jie Hong
|
534c46b53c
metal : use residency set for other platforms (#11648)
|
11 месяцев назад |
Georgi Gerganov
|
387a1598ca
authors : update
|
11 месяцев назад |
Georgi Gerganov
|
7c9e0ca520
sync : ggml
|
11 месяцев назад |
Christian Kastner
|
8f8290ada9
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
|
11 месяцев назад |
Georgi Gerganov
|
b34aedd558
ci : do not stale-close roadmap issues
|
11 месяцев назад |
Olivier Chafik
|
cde3833239
`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616)
|
11 месяцев назад |
Xuan-Son Nguyen
|
b3451785ac
server : (webui) revert hacky solution from #11626 (#11634)
|
11 месяцев назад |
Woof Dog
|
1d1e6a90bc
server : (webui) allow typing and submitting during llm response (#11626)
|
11 месяцев назад |
Daniel Bevenius
|
5598f475be
server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622)
|
11 месяцев назад |
Georgi Gerganov
|
8ec05832fa
sync : ggml
|
11 месяцев назад |
Johannes Gäßler
|
21c84b5d2d
CUDA: fix Volta FlashAttention logic (#11615)
|
11 месяцев назад |
mashdragon
|
d92cb67e37
server : (webui) Fix Shift+Enter handling (#11609)
|
11 месяцев назад |
Johannes Gäßler
|
6eecde3cc8
HIP: fix flash_attn_stream_k_fixup warning (#11604)
|
11 месяцев назад |