Neo Zhang Jianyu
|
349ea79fce
use max work group size for device to replace the magic number (#14732)
|
6 months ago |
Piotr Wilkin (ilintar)
|
670e1360cd
convert : fix Ernie4.5 MoE without shared experts (#14746)
|
6 months ago |
Wroclaw
|
760b4484e3
nix : use optionalAttrs for env mkDerivation attrset argument (#14726)
|
6 months ago |
Piotr Wilkin (ilintar)
|
cb887f1bc1
model: add Ernie 4.5 MoE support (#14658)
|
6 months ago |
Georgi Gerganov
|
d6fb3f6b49
kv-cache : fix k-shift for multiple streams (#14742)
|
6 months ago |
Georgi Gerganov
|
01612b7409
llama : reuse compute graphs (#14482)
|
6 months ago |
Tarek Dakhran
|
086cf81e88
llama : fix parallel processing for lfm2 (#14705)
|
6 months ago |
Georgi Gerganov
|
d9b691081c
kv-cache : opt mask set input (#14600)
|
6 months ago |
Georgi Gerganov
|
ad57d3edd2
batch : fix uninitialized has_cpl flag (#14733)
|
6 months ago |
Sigbjørn Skjæret
|
1ba45d4982
ci : disable failing vulkan crossbuilds (#14723)
|
6 months ago |
Sigbjørn Skjæret
|
19e5943d9e
convert : make hf token optional (#14717)
|
6 months ago |
Diner Burger
|
496957e1cb
llama : fix parameter order for hybrid memory initialization (#14725)
|
6 months ago |
Reese Levine
|
21c021745d
ggml: Add initial WebGPU backend (#14521)
|
6 months ago |
tempstudio
|
b0f0ecc3dc
model : support output bias for qwen2 (#14711)
|
6 months ago |
Georgi Gerganov
|
225e7a1438
llama : add high-throughput mode (#14363)
|
6 months ago |
Aman Gupta
|
ab14019821
Support diffusion models: Add Dream 7B (#14644)
|
6 months ago |
Georgi Gerganov
|
64978340b0
ggml : add asserts (#14720)
|
6 months ago |
Georgi Gerganov
|
6ffd4e9c44
server : pre-calculate EOG logit biases (#14721)
|
6 months ago |
Shunta Saito
|
e4841d24d3
llama : fix parallel processing for plamo2 (#14716)
|
6 months ago |
Georgi Gerganov
|
538cc77f7f
server : fix handling of the ignore_eos flag (#14710)
|
6 months ago |
Johannes Gäßler
|
5cae766541
scripts: synthetic prompt mode for server-bench.py (#14695)
|
6 months ago |
Sigbjørn Skjæret
|
4b91d6f71f
convert : only check for tokenizer folder if we need it (#14704)
|
6 months ago |
Sigbjørn Skjæret
|
cf91f217f1
convert : add pre-computed hashes first to prevent order mishaps (#14701)
|
6 months ago |
Min-Hua
|
79e0b68c17
llama: add LLAMA_API to deprecated llama_kv_self_seq_div (#14708)
|
6 months ago |
Ed Addario
|
c81f4192f9
gguf-py : dump bpw per layer and model in markdown mode (#14703)
|
6 months ago |
Gabriel Larson
|
4a4f426944
model : add Kimi-K2 support (#14654)
|
6 months ago |
Jeff Bolz
|
ba1ceb3456
vulkan: fix noncontig check for mat_mul_id splitting (#14683)
|
6 months ago |
Jeff Bolz
|
10a0351a97
vulkan: add RTE variants for glu/add/sub/mul/div (#14653)
|
6 months ago |
Shunta Saito
|
68e37a61a7
model : add PLaMo-2 support (#14560)
|
6 months ago |
R0CKSTAR
|
cbc68be51d
cuda: fix build warnings in set-rows.cu (unused variable) (#14687)
|
6 months ago |