Johannes Gäßler
|
46e3556e01
CUDA: add BF16 support (#11093)
|
1 yıl önce |
0cc4m
|
b56f079e28
Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074)
|
1 yıl önce |
fairydreaming
|
9394bbd484
llama : Add support for DeepSeek V3 (#11049)
|
1 yıl önce |
matt23654
|
f922a9c542
[GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047)
|
1 yıl önce |
DAN™
|
46be942214
llama : add support for the cohere2 model architecture (#10900)
|
1 yıl önce |
Georgi Gerganov
|
78c6785175
sync : ggml
|
1 yıl önce |
Georgi Gerganov
|
5e3b08d606
ggml : do not install metal source when embed library (ggml/1054)
|
1 yıl önce |
Daniel Bevenius
|
db68c93b57
ggml : improve inputs log sched_print_assignments (ggml/1053)
|
1 yıl önce |
Gilad S.
|
c31fc8b966
fix: Vulkan shader gen binary path (#11037)
|
1 yıl önce |
Molly Sophia
|
4b0c638b9a
common : disable KV cache shifting automatically for unsupported models (#11053)
|
1 yıl önce |
Georgi Gerganov
|
e7da954ecc
metal : avoid uint (#11019)
|
1 yıl önce |
Georgi Gerganov
|
f66f582927
llama : refactor `src/llama.cpp` (#10902)
|
1 yıl önce |
Pierrick Hymbert
|
2f0ee84b9b
server: bench: minor fixes (#10765)
|
1 yıl önce |
Xuan Son Nguyen
|
0da5d86026
server : allow using LoRA adapters per-request (#10994)
|
1 yıl önce |
Benson Wong
|
a45433ba20
readme : add llama-swap to infrastructure section (#11032)
|
1 yıl önce |
Srihari-mcw
|
0827b2c1da
ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027)
|
1 yıl önce |
Xuan Son Nguyen
|
45095a61bf
server : clean up built-in template detection (#11026)
|
1 yıl önce |
Xuan Son Nguyen
|
5896c65232
server : add OAI compat for /v1/completions (#10974)
|
1 yıl önce |
ymcki
|
bc7b1f8632
convert : fix Llama-3_1-Nemotron-51B rope settings (#11008)
|
1 yıl önce |
Peter
|
6e1531aca5
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013)
|
1 yıl önce |
Jeff Bolz
|
716bd6dec3
vulkan: optimize mul_mat for small values of N (#10991)
|
1 yıl önce |
ag2s20150909
|
c250ecb315
android : fix llama_batch free (#11014)
|
1 yıl önce |
Jeff Bolz
|
a813badbbd
vulkan: im2col and matmul optimizations for stable diffusion (#10942)
|
1 yıl önce |
Jeff Bolz
|
fdd2188912
vulkan: Use push constant offset to handle misaligned descriptors (#10987)
|
1 yıl önce |
Isaac McFadyen
|
f865ea149d
server: added more docs for response_fields field (#10995)
|
1 yıl önce |
Alexey Parfenov
|
16cdce7b68
server : fix token duplication when streaming with stop strings (#10997)
|
1 yıl önce |
Eve
|
d79d8f39b4
vulkan: multi-row k quants (#10846)
|
1 yıl önce |
Peter
|
d283d02bf2
examples, ggml : fix GCC compiler warnings (#10983)
|
1 yıl önce |
Reza Kakhki
|
9ba399dfa7
server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967)
|
1 yıl önce |
Djip007
|
2cd43f4900
ggml : more perfo with llamafile tinyblas on x86_64 (#10714)
|
1 yıl önce |