Kawrakow
|
89503dcb5f
iq3_xxs: quards for the no-imatrix situation (#5334)
|
1 rok temu |
Jared Van Bortel
|
1ec3332ade
YaRN : store rope scaling type as int32_t in memory (#5285)
|
1 rok temu |
Ian Bull
|
e1e721094d
llama : fix memory leak in llama_batch_free (#5252)
|
1 rok temu |
Guoteng
|
ce32060198
llama : support InternLM2 (#5184)
|
1 rok temu |
Georgi Gerganov
|
d3bac7d584
llama : reorder build_orion() at correct place (#5118)
|
1 rok temu |
Georgi Gerganov
|
5cb04dbc16
llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240)
|
1 rok temu |
Yiming Cui
|
d62520eb2c
Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231)
|
1 rok temu |
Jared Van Bortel
|
e8dc55d006
kompute : llama-bench support and ggml_cpu_has_kompute() (#5226)
|
1 rok temu |
Kawrakow
|
f4d7e54974
SOTA 3-bit quants (#5196)
|
1 rok temu |
Jared Van Bortel
|
6daa69ee81
kompute : fix fallback to CPU (#5201)
|
2 lat temu |
Jared Van Bortel
|
fbf1ddec69
Nomic Vulkan backend (#4456)
|
2 lat temu |
divinity76
|
2aed77eb06
fix typo "RLIMIT_MLOCK" (#5175)
|
2 lat temu |
0cc4m
|
2307523d32
ggml : add Vulkan backend (#2059)
|
2 lat temu |
Abhilash Majumder
|
0f648573dd
ggml : add unified SYCL backend for Intel GPUs (#2690)
|
2 lat temu |
Johannes Gäßler
|
9241c3a2ac
Apply min_p to unsorted tokens (#5115)
|
2 lat temu |
Johannes Gäßler
|
b2b2bf988c
Tests for min_p, sampling queue (#5147)
|
2 lat temu |
sharpHL
|
f2e69d28c0
llama : add support for Orion-14B (#5118)
|
2 lat temu |
Kawrakow
|
1182cf4d4f
Another bucket sort (#5109)
|
2 lat temu |
l3utterfly
|
5eaf9964fc
llama : dynamic temperature sampling (#4972)
|
2 lat temu |
Kawrakow
|
faa3526a1e
Fix Q3_K_XS for MoE models (#5113)
|
2 lat temu |
slaren
|
1387ea2117
llama : pre-allocate input tensors in a separate buffer (#5100)
|
2 lat temu |
Georgi Gerganov
|
89758723c7
minor : clean-up some warnings and style (#5094)
|
2 lat temu |
slaren
|
011e8ec577
llama : fix not enough space in buffer with Qwen (#5086)
|
2 lat temu |
compilade
|
d6bd4d46dd
llama : support StableLM 2 1.6B (#5052)
|
2 lat temu |
Kawrakow
|
66d575c45c
llama : add Q3_K_XS (#5060)
|
2 lat temu |
Shijie
|
3466c6ebcf
llama : add more qwen2 models (#5071)
|
2 lat temu |
slaren
|
6df465a91d
llama : run all KQV ops on the CPU with no KV offload (#5049)
|
2 lat temu |
Shijie
|
9b75cb2b3c
llama : support upcoming Qwen2 (#5037)
|
2 lat temu |
chiranko
|
2b3b999cac
llama : add CodeShell support (#5016)
|
2 lat temu |
John
|
57e2a7a52a
llama : fix falcon arch for tied output embeddings (#4978)
|
2 lat temu |