Maximilian Winter
|
ec903c0341
server : add self-extend support (#5104)
|
2 lat temu |
0cc4m
|
a1d6df129b
Add OpenCL add kernel (#5151)
|
2 lat temu |
Jared Van Bortel
|
bbe7c56c99
cmake : pass CPU architecture flags to nvcc (#5146)
|
2 lat temu |
slaren
|
62fead3ea0
cuda : fix tensor size calculation for non-split buffer (#5145)
|
2 lat temu |
slaren
|
15b4538ff2
ggml-alloc : add 10% margin to the buffer sizes (#5149)
|
2 lat temu |
snadampal
|
7032f4f634
ggml : update softmax n_task calculation (#5126)
|
2 lat temu |
Georgi Gerganov
|
5f1925a8ce
scripts : move run-with-preset.py from root to scripts folder
|
2 lat temu |
Georgi Gerganov
|
3b7c914de2
tests : gitignore test-c.o
|
2 lat temu |
Xuan Son Nguyen
|
48c857aa10
server : refactored the task processing logic (#5065)
|
2 lat temu |
crasm
|
413e7b0559
ci : add model tests + script wrapper (#4586)
|
2 lat temu |
Paul Tsochantaris
|
6dd3c28c9c
metal : remove unused `n_buffers` and `buffers` (#5129)
|
2 lat temu |
Riceball LEE
|
38b431de23
gguf : fix "general.alignment" type in gguf_reader.py (#5136)
|
2 lat temu |
Georgi Gerganov
|
aad0b01d73
readme : update hot topics
|
2 lat temu |
Kawrakow
|
1182cf4d4f
Another bucket sort (#5109)
|
2 lat temu |
XiaotaoChen
|
fe54033b69
readme : add MobileVLM 1.7B/3B to the supported models list (#5107)
|
2 lat temu |
l3utterfly
|
5eaf9964fc
llama : dynamic temperature sampling (#4972)
|
2 lat temu |
Jared Van Bortel
|
d292f4f204
examples : make pydantic scripts pass mypy and support py3.8 (#5099)
|
2 lat temu |
Valentin Konovalov
|
256d1bb0dd
android : use release cmake build type by default (#5123)
|
2 lat temu |
Kawrakow
|
faa3526a1e
Fix Q3_K_XS for MoE models (#5113)
|
2 lat temu |
Georgi Gerganov
|
ddc5a5033f
metal : show compile log messages
|
2 lat temu |
Engininja2
|
cd4fddb29f
cuda : fix 2-bit quants on amd hip (#5105)
|
2 lat temu |
Michael Hueschen
|
c9b316c78f
nix-shell: use addToSearchPath
|
2 lat temu |
Michael Hueschen
|
bf63d695b8
nix: add cc to devShell LD_LIBRARY_PATH
|
2 lat temu |
slaren
|
1387ea2117
llama : pre-allocate input tensors in a separate buffer (#5100)
|
2 lat temu |
Georgi Gerganov
|
26d607608d
metal : disable support for MUL_MAT F32 x F16
|
2 lat temu |
Kawrakow
|
44879ee885
Additional KL-divergence statistics (#5081)
|
2 lat temu |
Johannes Gäßler
|
9ecdd12e95
CUDA: more info when no device code (#5088)
|
2 lat temu |
Georgi Gerganov
|
89758723c7
minor : clean-up some warnings and style (#5094)
|
2 lat temu |
Xuan Son Nguyen
|
2bed4aa3f3
devops : add intel oneapi dockerfile (#5068)
|
2 lat temu |
Michael Coppola
|
125d03a503
llama.vim : added api key support (#5090)
|
2 lat temu |