Kawrakow
|
4a3156de2f
CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)
|
2 سال پیش |
David Pflug
|
a836c8f534
llama : fix missing quotes (#4937)
|
2 سال پیش |
Kawrakow
|
467a882fd2
Add ability to use importance matrix for all k-quants (#4930)
|
2 سال پیش |
Georgi Gerganov
|
bb0c139247
llama : check LLAMA_TRACE env for extra logging (#4929)
|
2 سال پیش |
Georgi Gerganov
|
9408cfdad6
scripts : sync-ggml-am.sh option to skip commits
|
2 سال پیش |
Georgi Gerganov
|
03c5267490
llama : use LLAMA_LOG_ macros for logging
|
2 سال پیش |
Kawrakow
|
a128c38de8
Fix ffn_down quantization mix for MoE models (#4927)
|
2 سال پیش |
Alex Azarov
|
5f5fe1bd60
metal : correctly set SIMD support flags on iOS (#4923)
|
2 سال پیش |
Karthik Kumar Viswanathan
|
ac32902a87
llama : support WinXP build with MinGW 8.1.0 (#3419)
|
2 سال پیش |
Kawrakow
|
147b17ac94
2-bit quantizations (#4897)
|
2 سال پیش |
Kawrakow
|
807179ec58
Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)
|
2 سال پیش |
Georgi Gerganov
|
76484fbfd3
sync : ggml
|
2 سال پیش |
Johannes Gäßler
|
c71d608ce7
ggml: cache sin/cos for RoPE (#4908)
|
2 سال پیش |
Georgi Gerganov
|
4be5ef556d
metal : remove old API (#4919)
|
2 سال پیش |
Georgi Gerganov
|
0ea069b87b
server : fix prompt caching with system prompt (#4914)
|
2 سال پیش |
Georgi Gerganov
|
f172de03f1
llama : fix detokenization of non-special added-tokens (#4916)
|
2 سال پیش |
Georgi Gerganov
|
2d57de5255
metal : disable log for loaded kernels (#4794)
|
2 سال پیش |
David Friehs
|
df845cc982
llama : minimize size used for state save/load (#4820)
|
2 سال پیش |
Someone
|
6b48ed0893
workflows: unbreak nix-build-aarch64, and split it out (#4915)
|
2 سال پیش |
Yann Follet
|
722d33f34e
main : add parameter --no-display-prompt (#4541)
|
2 سال پیش |
texmex76
|
c30b1ef39a
gguf : fix potential infinite for-loop (#4600)
|
2 سال پیش |
Georgi Gerganov
|
b38b5e93ae
metal : refactor kernel loading code (#4794)
|
2 سال پیش |
Johannes Gäßler
|
7dc78764e2
compare-llama-bench: tweak output format (#4910)
|
2 سال پیش |
Ziad Ben Hadj-Alouane
|
356327feb3
server : fix deadlock that occurs in multi-prompt scenarios (#4905)
|
2 سال پیش |
makomk
|
ee8243adaa
server : fix crash with multimodal models without BOS token (#4904)
|
2 سال پیش |
Georgi Gerganov
|
15ebe59210
convert : update phi-2 to latest HF repo (#4903)
|
2 سال پیش |
Georgi Gerganov
|
de473f5f8e
sync : ggml
|
2 سال پیش |
Georgi Gerganov
|
f238461236
ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758)
|
2 سال پیش |
slaren
|
fa5c1fb44a
backend_sched : fix assignments
|
2 سال پیش |
Maximilian Winter
|
52ee4540c0
examples : add pydantic models to GBNF grammar generator (#4883)
|
2 سال پیش |