Johannes Gäßler
|
73e2ed3ce3
CUDA: use async data loading for FlashAttention (#11894)
|
11 сар өмнө |
Eve
|
f7b1116af1
update release requirements (#11897)
|
11 сар өмнө |
Antoine Viallon
|
c4d29baf32
server : fix divide-by-zero in metrics reporting (#11915)
|
11 сар өмнө |
Rémy O
|
2eea03d86a
vulkan: implement several ops relevant for ggml_opt (#11769)
|
11 сар өмнө |
Xuan-Son Nguyen
|
0f2bbe6564
server : bump httplib to 0.19.0 (#11908)
|
11 сар өмнө |
standby24x7
|
fe163d5bf3
common : Fix a typo in help (#11899)
|
11 сар өмнө |
Xuan-Son Nguyen
|
818a340ea8
ci : fix (again) arm64 build fails (#11895)
|
11 сар өмнө |
Jeff Bolz
|
bf42a23d0a
vulkan: support multi/vision rope, and noncontiguous rope (#11902)
|
11 сар өмнө |
Hale Chan
|
c2ea16f260
metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904)
|
11 сар өмнө |
Johannes Gäßler
|
6dde178248
scripts: fix compare-llama-bench commit hash logic (#11891)
|
11 сар өмнө |
708-145
|
fc10c38ded
examples: fix typo in imatrix/README.md (#11884)
|
11 сар өмнө |
Adrian Kretz
|
22885105a6
metal : optimize dequant q6_K kernel (#11892)
|
11 сар өмнө |
Georgi Gerganov
|
c2cd24fbfd
readme : add notice about new package registry (#11890)
|
11 сар өмнө |
Georgi Gerganov
|
68ff663a04
repo : update links to new url (#11886)
|
11 сар өмнө |
Olivier Chafik
|
f355229692
server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880)
|
11 сар өмнө |
Rémy O
|
fc1b0d0936
vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528)
|
11 сар өмнө |
Michał Moskal
|
89daa2564f
llguidance build fixes for Windows (#11664)
|
11 сар өмнө |
lhez
|
300907b211
opencl: Fix rope and softmax (#11833)
|
11 сар өмнө |
Diego Devesa
|
94b87f87b5
cuda : add ampere to the list of default architectures (#11870)
|
11 сар өмнө |
Georgi Gerganov
|
dbc2ec59b5
docker : drop to CUDA 12.4 (#11869)
|
11 сар өмнө |
Daniel Bevenius
|
3d68f034da
llama : add completion for --chat-template-file (#11860)
|
11 сар өмнө |
Jinyang He
|
38e32eb6a0
ggml: optimize some vec dot functions for LoongArch ASX (#11842)
|
11 сар өмнө |
Eve
|
a4f011e8d0
vulkan: linux builds + small subgroup size fixes (#11767)
|
11 сар өмнө |
theraininsky
|
a7b8ce2260
llama-bench : fix unexpected global variable initialize sequence issue (#11832)
|
11 сар өмнө |
Georgi Gerganov
|
04045bb842
readme : minor
|
11 сар өмнө |
Jeffrey Morgan
|
8a8c4ceb60
llamafile: use member variable instead of constant for iq4nlt (#11780)
|
11 сар өмнө |
Reza Rahemtola
|
c1f958c038
server : (docs) Update wrong tool calling example (#11809)
|
11 сар өмнө |
Daniel Bevenius
|
c48f630d1c
llama : add --completion-bash option (#11846)
|
11 сар өмнө |
R0CKSTAR
|
bd6e55bfd3
musa: bump MUSA SDK version to rc3.1.1 (#11822)
|
11 сар өмнө |
Olivier Chafik
|
c7f460ab88
`server`: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless `--reasoning-format none` (#11607)
|
11 сар өмнө |