snadampal
|
2319126a70
fix q4_0_8_8 format for corrupted tokens issue (#10198)
|
1 year ago |
Zhiyuan Li
|
3bcd40b3c5
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133)
|
1 year ago |
Georgi Gerganov
|
5c333e0140
metal : add BF16 support (#8439)
|
1 year ago |
Georgi Gerganov
|
b11f9ba9b8
server : remove hack for extra parallel slot (#10187)
|
1 year ago |
Diego Devesa
|
94d8cb8be1
metal : fix from ptr buffer name (#10189)
|
1 year ago |
Georgi Gerganov
|
1dc04b2dee
ggml : adjust is_first_call init value (#10193)
|
1 year ago |
Georgi Gerganov
|
a1eaf6a960
metal : add quantized FA support (#10149)
|
1 year ago |
Gabe Goodhart
|
b8deef0ec0
llama : add <|tool_call|> formatting to Granite template (#10177)
|
1 year ago |
Diego Devesa
|
a9e8a9a030
ggml : fix arch check in bf16_to_fp32 (#10164)
|
1 year ago |
Eve
|
3407364776
Q6_K AVX improvements (#10118)
|
1 year ago |
Diego Devesa
|
d5a409e57f
ggml : fix gelu tables initialization (#10172)
|
1 year ago |
Diego Devesa
|
401558b7ba
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167)
|
1 year ago |
Xuan Son Nguyen
|
9e0ecfb697
server : clarify /slots endpoint, add is_processing (#10162)
|
1 year ago |
snadampal
|
6a066b9978
fix build break on arm64 linux (#10166)
|
1 year ago |
Diego Devesa
|
ea02c753eb
cuda : clear error after changing peer access (#10153)
|
1 year ago |
Georgi Gerganov
|
05697f670b
metal : simplify f16 and f32 dequant kernels (#0)
|
1 year ago |
Georgi Gerganov
|
f8e58135cf
metal : move dequantize templates to beginning of MSL source (#0)
|
1 year ago |
leo-pony
|
329ed914c9
CANN: adjust backend registry refactor. (#10158)
|
1 year ago |
Georgi Gerganov
|
ce027adfb3
sync : ggml
|
1 year ago |
Yuri Khrustalev
|
284e5b0275
cmake : make it possible linking ggml as external lib (ggml/1003)
|
1 year ago |
Plamen Minev
|
e2292aaa17
metal : fix minor string leaks (ggml/1004)
|
1 year ago |
Diego Devesa
|
9f40989351
ggml : move CPU backend to a separate file (#10144)
|
1 year ago |
Georgi Gerganov
|
08828a6d7d
metal : minor fixup in FA kernel (#10143)
|
1 year ago |
Georgi Gerganov
|
1839f69130
flake.lock: Update (#10146)
|
1 year ago |
Christian Köhnenkamp
|
9830b6923b
Add apple arm to presets (#10134)
|
1 year ago |
sasha0552
|
42cadc74bd
server : fix slot selection by lru (#10126)
|
1 year ago |
Georgi Gerganov
|
45950415ed
server : fix endpoint checks (#10135)
|
1 year ago |
Georgi Gerganov
|
1926d6e39d
llama : adjust default context size + print warnings (#10136)
|
1 year ago |
Diego Devesa
|
b634f8a26f
simple-chat : only add bos on first prompt (#10129)
|
1 year ago |
Xuan Son Nguyen
|
7554aa4655
convert-lora : make `--base` optional (#10110)
|
1 year ago |