Ting Lou
|
a800ae46da
llava : add struct for FFI bindgen (#12079)
|
10 months ago |
Sigbjørn Skjæret
|
69050a11be
Refactor gguf scripts to improve metadata handling (#11909)
|
10 months ago |
Aleksei Nikiforov
|
3567ee3a94
gguf-py: enable reading non-native endian files (#12081)
|
10 months ago |
Kante Yin
|
53e4db1012
readme : update infra list (#9096)
|
10 months ago |
Olivier Chafik
|
d7cfe1ffe0
docs: add docs/function-calling.md to lighten server/README.md's plight (#12069)
|
10 months ago |
Jeff Bolz
|
a82c9e7c23
vulkan: fix assertion when qy_needs_dequant (#12068)
|
10 months ago |
rhjdvsgsgks
|
401af80b54
server: handle echo=false on /v1/completions (#12060)
|
10 months ago |
Judd
|
c132239bfb
add OP sigmoid (#12056)
|
10 months ago |
Molly Sophia
|
393fca629e
ggml-cpu: Fix build with sve (#12059)
|
10 months ago |
Rémy O
|
61d4f39dfe
vulkan: implement more backpropagation operators (#11914)
|
10 months ago |
Olivier Chafik
|
0b52745649
server: support add_generation_prompt query param (#12062)
|
10 months ago |
Alex Brooks
|
4d1051a40f
Add Doc for Converting Granite Vision -> GGUF (#12006)
|
10 months ago |
Vitali Lovich
|
3e9a2860e9
llama : expose llama_model_n_head_kv in the API (#11997)
|
10 months ago |
Gian-Carlo Pascutto
|
58d07a8043
metal : copy kernels for quant to F32/F16 conversions (#12017)
|
10 months ago |
lhez
|
34a846b584
opencl: fix for small models (#11950)
|
11 months ago |
Alex Brooks
|
7a2c913e66
llava : Add Granite Vision Support (#11794)
|
11 months ago |
Neo Zhang Jianyu
|
08d5986290
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)
|
11 months ago |
Aleksei Nikiforov
|
651adf4b66
gguf_convert_endian.py: implement byteswapping for q4_k and q6_k (#11349)
|
11 months ago |
Akarshan Biswas
|
8303e8b0fb
SYCL: Fix GGML_SYCL_DEBUG macro (#11995)
|
11 months ago |
Florent BENOIT
|
7ad0779f5d
run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041)
|
11 months ago |
Eric Curtin
|
f777a73e18
Some llama-run cleanups (#11973)
|
11 months ago |
Aaron Teo
|
af7747c95a
ggml-cpu: Support s390x SIMD Instruction Set (#12019)
|
11 months ago |
Johannes Gäßler
|
a28e0d5eb1
CUDA: app option to compile without FlashAttention (#12025)
|
11 months ago |
Ting Lou
|
36c258ee92
llava: build clip image from pixels (#11999)
|
11 months ago |
Georgi Gerganov
|
f3e64859ed
ci : fix arm upload artifacts (#12024)
|
11 months ago |
Johannes Gäßler
|
5fa07c2f93
CUDA: optimize FA for GQA + large batches (#12014)
|
11 months ago |
Rohanjames1997
|
335eb04a91
ci : Build on Github-hosted arm64 runners (#12009)
|
11 months ago |
Georgi Gerganov
|
cf756d6e0a
server : disable Nagle's algorithm (#12020)
|
11 months ago |
Gian-Carlo Pascutto
|
d70908421f
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000)
|
11 months ago |
Daniel Bevenius
|
de8b5a3624
llama.swiftui : add "Done" dismiss button to help view (#11998)
|
11 months ago |