Johannes Gäßler
|
5fa07c2f93
CUDA: optimize FA for GQA + large batches (#12014)
|
11 mesi fa |
Rohanjames1997
|
335eb04a91
ci : Build on Github-hosted arm64 runners (#12009)
|
11 mesi fa |
Georgi Gerganov
|
cf756d6e0a
server : disable Nagle's algorithm (#12020)
|
11 mesi fa |
Gian-Carlo Pascutto
|
d70908421f
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000)
|
11 mesi fa |
Daniel Bevenius
|
de8b5a3624
llama.swiftui : add "Done" dismiss button to help view (#11998)
|
11 mesi fa |
Georgi Gerganov
|
51f311e057
llama : skip loading unused tensors (#12004)
|
11 mesi fa |
Johannes Gäßler
|
586d5fe6eb
doc: update contributing guidelines [no ci] (#11969)
|
11 mesi fa |
PureJourney
|
ecc8e3aeff
CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984)
|
11 mesi fa |
Bodhi
|
0b3863ff95
MUSA: support ARM64 and enable dp4a .etc (#11843)
|
11 mesi fa |
Alex Brooks
|
ee02ad02c5
clip : fix visual encoders with no CLS (#11982)
|
11 mesi fa |
momonga
|
c392e5094d
server (webui): Fix Premature Submission During IME Conversion (#11971)
|
11 mesi fa |
Charles Xu
|
c5d91a7400
ggml-cpu: Add CPU backend support for KleidiAI library (#11390)
|
11 mesi fa |
Prashant Vithule
|
4806498bf1
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917)
|
11 mesi fa |
Michael Engel
|
0d559580a0
run : add --chat-template-file (#11961)
|
11 mesi fa |
Johannes Gäßler
|
d04e7163c8
doc: add links to ggml examples [no ci] (#11958)
|
11 mesi fa |
Daniel Bevenius
|
d07c621393
common : add llama.vim preset for Qwen2.5 Coder (#11945)
|
11 mesi fa |
Georgi Gerganov
|
abd4d0bc4f
speculative : update default params (#11954)
|
11 mesi fa |
Daniel Bevenius
|
9626d9351a
llama : fix indentation in llama-grammar [no ci] (#11943)
|
11 mesi fa |
igardev
|
b58934c183
server : (webui) Enable communication with parent html (if webui is in iframe) (#11940)
|
11 mesi fa |
Olivier Chafik
|
63e489c025
tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900)
|
11 mesi fa |
Xuan-Son Nguyen
|
63ac128563
server : add TEI API format for /rerank endpoint (#11942)
|
11 mesi fa |
MoonRide303
|
5137da7b8c
scripts: corrected encoding when getting chat template (#11866) (#11907)
|
11 mesi fa |
xiaobing318
|
09aaf4f1f5
docs : Fix duplicated file extension in test command (#11935)
|
11 mesi fa |
Johannes Gäßler
|
73e2ed3ce3
CUDA: use async data loading for FlashAttention (#11894)
|
11 mesi fa |
Eve
|
f7b1116af1
update release requirements (#11897)
|
11 mesi fa |
Antoine Viallon
|
c4d29baf32
server : fix divide-by-zero in metrics reporting (#11915)
|
11 mesi fa |
Rémy O
|
2eea03d86a
vulkan: implement several ops relevant for ggml_opt (#11769)
|
11 mesi fa |
Xuan-Son Nguyen
|
0f2bbe6564
server : bump httplib to 0.19.0 (#11908)
|
11 mesi fa |
standby24x7
|
fe163d5bf3
common : Fix a typo in help (#11899)
|
11 mesi fa |
Xuan-Son Nguyen
|
818a340ea8
ci : fix (again) arm64 build fails (#11895)
|
11 mesi fa |