Georgi Gerganov
|
47f931c8f9
server : enable cache_prompt by default (#10501)
|
1 rok temu |
Georgi Gerganov
|
106964e3d2
metal : enable mat-vec kernels for bs <= 4 (#10491)
|
1 rok temu |
Shane A
|
80acb7b430
Rename Olmo1124 to Olmo2 (#10500)
|
1 rok temu |
Diego Devesa
|
10bce0450f
llama : accept a list of devices to use to offload a model (#10497)
|
1 rok temu |
Johannes Gäßler
|
1f922254f0
Github: update issue templates [no ci] (#10489)
|
1 rok temu |
brucepro
|
a9a678a6b2
Add download chat feature to server chat (#10481)
|
1 rok temu |
Georgi Gerganov
|
9ca2e67762
server : add speculative decoding support (#10455)
|
1 rok temu |
Diego Devesa
|
5931c1f233
ggml : add support for dynamic loading of backends (#10469)
|
1 rok temu |
Georgi Gerganov
|
f6d12e7df8
tests : fix compile warning
|
1 rok temu |
Georgi Gerganov
|
b756441104
metal : minor code formatting
|
1 rok temu |
Neo Zhang Jianyu
|
5a8987793f
[SYCL] Fix building Win package for oneAPI 2025.0 update (#10483)
|
1 rok temu |
Georgi Gerganov
|
d9d54e498d
speculative : refactor and add a simpler example (#10362)
|
1 rok temu |
Georgi Gerganov
|
cce5a90075
flake.lock: Update (#10470)
|
1 rok temu |
Diego Devesa
|
dc39012cba
llama : fix op mul check with command-r-plus (#10476)
|
1 rok temu |
Gabe Goodhart
|
9336db462c
convert : XLMRoberta Type Vocab Size (#10458)
|
1 rok temu |
momonga
|
96fa2c5e2d
fix gguf-py: Conversion error when multiple licenses are configured (#9807)
|
1 rok temu |
Diego Devesa
|
55ed008b2d
ggml : do not use ARM features not included in the build (#10457)
|
1 rok temu |
蕭澧邦
|
6dfcfef078
ci: Update oneAPI runtime dll packaging (#10428)
|
1 rok temu |
Johannes Gäßler
|
599b3e0cd4
GitHub: ask for more info in issue templates (#10426)
|
1 rok temu |
leo-pony
|
c18610b4ee
CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216)
|
1 rok temu |
Diego Devesa
|
a5e47592b6
cuda : optimize argmax (#10441)
|
1 rok temu |
Georgi Gerganov
|
1bb30bf28c
llama : handle KV shift for recurrent models (#10402)
|
1 rok temu |
Georgi Gerganov
|
87a533be57
sync : ggml
|
1 rok temu |
slaren
|
59b9172822
ggml/sched : do not skip views in pre-assignments
|
1 rok temu |
Johannes Gäßler
|
02e4eaf22f
ggml-opt: fix data corruption (ggml/1022)
|
1 rok temu |
Jeff Bolz
|
9abe9eeae9
vulkan: predicate max operation in soft_max shaders/soft_max (#10437)
|
1 rok temu |
bandoti
|
f95caa7954
cmake: add link dependencies to cmake find pkg (#10433)
|
1 rok temu |
Diego Devesa
|
fab5d30ff6
llama : add .clang-format file (#10415)
|
1 rok temu |
Jeff Bolz
|
8fd4b7fa29
vulkan: copy iq4_nl LUT into shared memory (#10409)
|
1 rok temu |
Jeff Bolz
|
1bacb9f625
vulkan: further optimize mul_mat_vec using larger loads (#10387)
|
1 rok temu |