Alex Brooks
|
4d1051a40f
Add Doc for Converting Granite Vision -> GGUF (#12006)
|
10 месяцев назад |
Vitali Lovich
|
3e9a2860e9
llama : expose llama_model_n_head_kv in the API (#11997)
|
10 месяцев назад |
Gian-Carlo Pascutto
|
58d07a8043
metal : copy kernels for quant to F32/F16 conversions (#12017)
|
10 месяцев назад |
lhez
|
34a846b584
opencl: fix for small models (#11950)
|
10 месяцев назад |
Alex Brooks
|
7a2c913e66
llava : Add Granite Vision Support (#11794)
|
10 месяцев назад |
Neo Zhang Jianyu
|
08d5986290
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)
|
10 месяцев назад |
Aleksei Nikiforov
|
651adf4b66
gguf_convert_endian.py: implement byteswapping for q4_k and q6_k (#11349)
|
10 месяцев назад |
Akarshan Biswas
|
8303e8b0fb
SYCL: Fix GGML_SYCL_DEBUG macro (#11995)
|
10 месяцев назад |
Florent BENOIT
|
7ad0779f5d
run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041)
|
10 месяцев назад |
Eric Curtin
|
f777a73e18
Some llama-run cleanups (#11973)
|
10 месяцев назад |
Aaron Teo
|
af7747c95a
ggml-cpu: Support s390x SIMD Instruction Set (#12019)
|
10 месяцев назад |
Johannes Gäßler
|
a28e0d5eb1
CUDA: app option to compile without FlashAttention (#12025)
|
10 месяцев назад |
Ting Lou
|
36c258ee92
llava: build clip image from pixels (#11999)
|
10 месяцев назад |
Georgi Gerganov
|
f3e64859ed
ci : fix arm upload artifacts (#12024)
|
10 месяцев назад |
Johannes Gäßler
|
5fa07c2f93
CUDA: optimize FA for GQA + large batches (#12014)
|
10 месяцев назад |
Rohanjames1997
|
335eb04a91
ci : Build on Github-hosted arm64 runners (#12009)
|
10 месяцев назад |
Georgi Gerganov
|
cf756d6e0a
server : disable Nagle's algorithm (#12020)
|
10 месяцев назад |
Gian-Carlo Pascutto
|
d70908421f
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000)
|
10 месяцев назад |
Daniel Bevenius
|
de8b5a3624
llama.swiftui : add "Done" dismiss button to help view (#11998)
|
11 месяцев назад |
Georgi Gerganov
|
51f311e057
llama : skip loading unused tensors (#12004)
|
11 месяцев назад |
Johannes Gäßler
|
586d5fe6eb
doc: update contributing guidelines [no ci] (#11969)
|
11 месяцев назад |
PureJourney
|
ecc8e3aeff
CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984)
|
11 месяцев назад |
Bodhi
|
0b3863ff95
MUSA: support ARM64 and enable dp4a .etc (#11843)
|
11 месяцев назад |
Alex Brooks
|
ee02ad02c5
clip : fix visual encoders with no CLS (#11982)
|
11 месяцев назад |
momonga
|
c392e5094d
server (webui): Fix Premature Submission During IME Conversion (#11971)
|
11 месяцев назад |
Charles Xu
|
c5d91a7400
ggml-cpu: Add CPU backend support for KleidiAI library (#11390)
|
11 месяцев назад |
Prashant Vithule
|
4806498bf1
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917)
|
11 месяцев назад |
Michael Engel
|
0d559580a0
run : add --chat-template-file (#11961)
|
11 месяцев назад |
Johannes Gäßler
|
d04e7163c8
doc: add links to ggml examples [no ci] (#11958)
|
11 месяцев назад |
Daniel Bevenius
|
d07c621393
common : add llama.vim preset for Qwen2.5 Coder (#11945)
|
11 месяцев назад |