Aarni Koskela
|
c4e6dd59e4
llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478)
|
1 年之前 |
Aarni Koskela
|
037259be68
llama : make load error reporting more granular (#5477)
|
1 年之前 |
Daniel Bevenius
|
263978904c
finetune : rename feed-forward tensors (w1/w2/w3) (#4839)
|
1 年之前 |
Georgi Gerganov
|
cf45252a7c
tests : multi-thread the tokenizer tests (#5474)
|
1 年之前 |
Douglas Hanley
|
03bf161eb6
llama : support batched embeddings (#5466)
|
1 年之前 |
Johannes Gäßler
|
ad014bba97
make: add error message for bad CUDA version (#5444)
|
1 年之前 |
Georgi Gerganov
|
49cc1f7d67
bert : add tests + fix quantization (#5475)
|
1 年之前 |
Georgi Gerganov
|
99b8b43d7b
tests : disable moe test (#5473)
|
1 年之前 |
Kawrakow
|
895407f31b
ggml-quants : fix compiler warnings (shadow variable) (#5472)
|
1 年之前 |
Georgi Gerganov
|
099afc6274
llama : fix quantization when tensors are missing (#5423)
|
1 年之前 |
Georgi Gerganov
|
df334a1125
swift : package no longer use ggml dependency (#5465)
|
1 年之前 |
Lee
|
dbd8828eb0
py : fix persimmon `n_rot` conversion (#5460)
|
1 年之前 |
Abhilash Majumder
|
43fe07c1a4
ggml-sycl: Replace 3d ops with macro (#5458)
|
1 年之前 |
Daniel Bevenius
|
4a46d2b792
llava : remove prog parameter from ArgumentParser (#5457)
|
1 年之前 |
Georgi Gerganov
|
3b169441df
sync : ggml (#5452)
|
1 年之前 |
Johannes Gäßler
|
3bdc4cd0f5
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434)
|
1 年之前 |
Douglas Hanley
|
2891c8aa9a
Add support for BERT embedding models (#5423)
|
1 年之前 |
github-actions[bot]
|
97a336507e
flake.lock: Update
|
1 年之前 |
Sergio López
|
c88c74f967
vulkan: only use M-sized matmul on Apple GPUs (#5412)
|
1 年之前 |
Alexey Parfenov
|
a803333a4e
common : use enums for sampler types (#5418)
|
1 年之前 |
Alexey Parfenov
|
684780141a
server : allow to specify tokens as strings in logit_bias (#5003)
|
1 年之前 |
Georgi Gerganov
|
85910c5b30
main : ctrl+C print timing in non-interactive mode (#3873)
|
1 年之前 |
Georgi Gerganov
|
139b62a839
common : fix compile warning
|
1 年之前 |
Georgi Gerganov
|
0f2411f154
ggml : fix compile warnings (unused vars) (#4966)
|
1 年之前 |
snadampal
|
a07d0fee1f
ggml : add mmla kernels for quantized GEMM (#4966)
|
1 年之前 |
Johannes Gäßler
|
e4640d8fdf
lookup: add print for drafting performance (#5450)
|
1 年之前 |
Xuan Son Nguyen
|
907e08c110
server : add llama2 chat template (#5425)
|
1 年之前 |
Ian Bull
|
f026f8120f
metal : use autoreleasepool to avoid memory leaks (#5437)
|
1 年之前 |
Georgi Gerganov
|
cd9aea63b5
scripts : update sync scripts with new backends
|
1 年之前 |
Georgi Gerganov
|
43b65f5eb8
sync : ggml
|
1 年之前 |