Richard Kiss
|
532dd74e38
Fix some documentation typos/grammar mistakes (#4032)
|
2 anni fa |
M. Yusuf Sarıgöz
|
e86fc56f75
Fix gguf-convert-endian script (#4037)
|
2 anni fa |
Alexey Parfenov
|
d96ca7ded7
server : fix crash when prompt exceeds context size (#3996)
|
2 anni fa |
Kerfuffle
|
34b0a08207
gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981)
|
2 anni fa |
Jhen-Jie Hong
|
4a4fd3eefa
server : allow continue edit on completion mode (#3950)
|
2 anni fa |
Galunid
|
df9d1293de
Unbreak persimmon after #3837 (#4010)
|
2 anni fa |
Galunid
|
a75fa576ab
scripts: Generalize convert scripts (#3838)
|
2 anni fa |
Mihai
|
57ad015dc3
server : add min_p param (#3877)
|
2 anni fa |
slaren
|
875fb42871
ggml-alloc : fix backend assignments of views (#3982)
|
2 anni fa |
Jared Van Bortel
|
0a7c980b6f
gguf : track writer state, free unneeded tensors, cleanup (#3871)
|
2 anni fa |
Georgi Gerganov
|
413503d4b9
make : do not add linker flags when compiling static llava lib (#3977)
|
2 anni fa |
xaedes
|
e9c1cecb9d
ggml : fix backward rope after YaRN (#3974)
|
2 anni fa |
Matthew Tejo
|
54b4df8886
Use params when loading models in llava-cli (#3976)
|
2 anni fa |
Meng Zhang
|
46876d2a2c
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
|
2 anni fa |
Damian Stewart
|
381efbf480
llava : expose as a shared library for downstream projects (#3613)
|
2 anni fa |
slaren
|
2833a6f63c
ggml-cuda : fix f16 mul mat (#3961)
|
2 anni fa |
Kerfuffle
|
d9ccce2e33
Allow common process_escapes to handle \x sequences (#3928)
|
2 anni fa |
Thái Hoàng Tâm
|
bb60fd0bf6
server : fix typo for --alias shortcut from -m to -a (#3958)
|
2 anni fa |
Jared Van Bortel
|
132d25b8a6
cuda : fix disabling device with --tensor-split 1,0 (#3951)
|
2 anni fa |
Meng Zhang
|
3d48f42efc
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
|
2 anni fa |
Eve
|
c41ea36eaa
cmake : MSVC instruction detection (fixed up #809) (#3923)
|
2 anni fa |
Eve
|
a7fac013cf
ci : use intel sde when ci cpu doesn't support avx512 (#3949)
|
2 anni fa |
slaren
|
48ade94538
cuda : revert CUDA pool stuff (#3944)
|
2 anni fa |
Kerfuffle
|
f28af0d81a
gguf-py: Support 01.AI Yi models (#3943)
|
2 anni fa |
Peter Sugihara
|
d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938)
|
2 anni fa |
Xiao-Yong Jin
|
5ba3746171
ggml-metal: fix yarn rope (#3937)
|
2 anni fa |
slaren
|
abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921)
|
2 anni fa |
Georgi Gerganov
|
8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
|
2 anni fa |
Georgi Gerganov
|
05816027d6
common : YAYF (yet another YARN fix) (#3925)
|
2 anni fa |
cebtenzzre
|
3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 (#3922)
|
2 anni fa |