Georgi Gerganov
|
a0edf73bda
server : fix uninitialized sampling context (close #3685)
|
2 anni fa |
Herman Semenov
|
f439e506e8
ggml : fix rope + llama minor optimizations (#3560)
|
2 anni fa |
cebtenzzre
|
e78f3ef24a
convert : restore compat with old Falcon models (#3680)
|
2 anni fa |
M. Yusuf Sarıgöz
|
f3b25e4043
multimodal : add BakLLaVA conversion support (#3682)
|
2 anni fa |
M. Yusuf Sarıgöz
|
60abea9798
llava : avoid segfault in case of non-existent mmproj file (#3674)
|
2 anni fa |
Georgi Gerganov
|
004797f6ac
readme : update hot topics
|
2 anni fa |
Georgi Gerganov
|
4e82b2ea3f
speculative : bug fixes
|
2 anni fa |
Georgi Gerganov
|
0e89203b51
speculative : add tree-based sampling example (#3624)
|
2 anni fa |
Jhen-Jie Hong
|
c67fe68e41
metal : implement q5_0 and q5_1 kernels (#3648)
|
2 anni fa |
shibe2
|
1117d06607
opencl : fix element-wise multiplication (#3656)
|
2 anni fa |
slaren
|
cb33f43a2a
fix embeddings when using CUDA (#3657)
|
2 anni fa |
Georgi Gerganov
|
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG (#3538)
|
2 anni fa |
BarfingLemurs
|
8402566a7c
readme : update hot-topics & models, detail windows release in usage (#3615)
|
2 anni fa |
shibe2
|
40e5ce054f
CLBlast: Fix temporary buffer size for f16 conversion (wsize)
|
2 anni fa |
slaren
|
a5e8c1d8c7
train-text-from-scratch : fix assert failure in ggml-alloc (#3618)
|
2 anni fa |
Georgi Gerganov
|
e74c705e15
editorconfig : remove trailing spaces
|
2 anni fa |
coezbek
|
3ad1e3f1a1
server : documentation of JSON return value of /completion endpoint (#3632)
|
2 anni fa |
Georgi Gerganov
|
1142013da4
save-load-state : fix example + add ci test (#3655)
|
2 anni fa |
ldwang
|
5fe268a4d9
readme : add Aquila2 links (#3610)
|
2 anni fa |
staviq
|
1a159553f9
tokenizer : special token handling (#3538)
|
2 anni fa |
Georgi Gerganov
|
281ef73c25
k-quants : fix quantization ranges (#3646)
|
2 anni fa |
Georgi Gerganov
|
940efa95fe
llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)
|
2 anni fa |
cebtenzzre
|
11bff29045
MPT : support GQA for replit-code-v1.5 (#3627)
|
2 anni fa |
M. Yusuf Sarıgöz
|
11dc1091f6
Honor -ngl option for Cuda offloading in llava (#3621)
|
2 anni fa |
Daniel Bevenius
|
2a4bcbacea
llama : remove n_threads from llama_decode_internal (#3614)
|
2 anni fa |
slaren
|
424b6381c4
ggml : add context enumeration functions (#3605)
|
2 anni fa |
shibe2
|
1e0e873c37
CLBlast: Fix matrix-vector multiplication (#3544)
|
2 anni fa |
M. Yusuf Sarıgöz
|
370359e5ba
examples: support LLaVA v1.5 (multimodal model) (#3436)
|
2 anni fa |
uint256_t
|
9e24cc6e2e
docs : fix typo GOMP_CPU_AFFINITY (#3597)
|
2 anni fa |
Georgi Gerganov
|
d28e572c02
cmake : fix add_compile_options on macOS
|
2 anni fa |