Kawrakow
|
2b3a665d39
llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996)
|
2 lat temu |
Kawrakow
|
334a835a1c
ggml : importance matrix support for legacy quants (#4969)
|
2 lat temu |
David Friehs
|
4483396751
llama : apply classifier-free guidance to logits directly (#4951)
|
2 lat temu |
Kawrakow
|
2faaef3979
llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)
|
2 lat temu |
David Pflug
|
a836c8f534
llama : fix missing quotes (#4937)
|
2 lat temu |
Georgi Gerganov
|
bb0c139247
llama : check LLAMA_TRACE env for extra logging (#4929)
|
2 lat temu |
Georgi Gerganov
|
03c5267490
llama : use LLAMA_LOG_ macros for logging
|
2 lat temu |
Kawrakow
|
a128c38de8
Fix ffn_down quantization mix for MoE models (#4927)
|
2 lat temu |
Karthik Kumar Viswanathan
|
ac32902a87
llama : support WinXP build with MinGW 8.1.0 (#3419)
|
2 lat temu |
Kawrakow
|
147b17ac94
2-bit quantizations (#4897)
|
2 lat temu |
Kawrakow
|
807179ec58
Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)
|
2 lat temu |
Georgi Gerganov
|
4be5ef556d
metal : remove old API (#4919)
|
2 lat temu |
Georgi Gerganov
|
f172de03f1
llama : fix detokenization of non-special added-tokens (#4916)
|
2 lat temu |
David Friehs
|
df845cc982
llama : minimize size used for state save/load (#4820)
|
2 lat temu |
Georgi Gerganov
|
15ebe59210
convert : update phi-2 to latest HF repo (#4903)
|
2 lat temu |
slaren
|
e7e4df031b
llama : ggml-backend integration (#4766)
|
2 lat temu |
Georgi Gerganov
|
584d674be6
llama : remove redundant assert for StableLM (#4901)
|
2 lat temu |
Georgi Gerganov
|
3cabe80630
llama : fix typo "imp_embd" -> "inp_embd"
|
2 lat temu |
Georgi Gerganov
|
f445c0e68c
llama : fix llm_build_k_shift to use correct n_rot (#4889)
|
2 lat temu |
Kawrakow
|
469e75d0a3
llama : restore intended k-quants mixes for MoE models (#4872)
|
2 lat temu |
Kawrakow
|
49662cbed3
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
|
2 lat temu |
pudepiedj
|
43f76bf1c3
main : print total token count and tokens consumed so far (#4874)
|
2 lat temu |
Brian
|
57d016ba2d
llama : add additional suffixes for model params (#4834)
|
2 lat temu |
Austin
|
329ff61569
llama : recognize 1B phi models (#4847)
|
2 lat temu |
Kawrakow
|
dd5ae06405
SOTA 2-bit quants (#4773)
|
2 lat temu |
Georgi Gerganov
|
b0034d93ce
examples : add passkey test (#3856)
|
2 lat temu |
Georgi Gerganov
|
9dede37d81
llama : remove unused vars (#4796)
|
2 lat temu |
Georgi Gerganov
|
3c36213df8
llama : remove redundant GQA check (#4796)
|
2 lat temu |
Georgi Gerganov
|
d117d4dc5d
llama : print tensor meta for debugging
|
2 lat temu |
Georgi Gerganov
|
540938f890
llama : llama_model_desc print number of experts
|
2 lat temu |