Georgi Gerganov
|
207b51900e
ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861)
|
2 years ago |
Kerfuffle
|
6e08281e58
Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843)
|
2 years ago |
cebtenzzre
|
2046eb4345
make : remove unnecessary dependency on build-info.h (#3842)
|
2 years ago |
Georgi Gerganov
|
71a09da301
llama : fix kv shift bug (#3835)
|
2 years ago |
Georgi Gerganov
|
d69d777c02
ggml : quantization refactoring (#3833)
|
2 years ago |
Erik Scholz
|
ff3bad83e2
flake : update flake.lock for newer transformers version + provide extra dev shell (#3797)
|
2 years ago |
Aarni Koskela
|
82a6646e02
metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793)
|
2 years ago |
Georgi Gerganov
|
ba231e8a6d
issues : change label from bug to bug-unconfirmed (#3748)
|
2 years ago |
Georgi Gerganov
|
8a2f2fea29
convert : ignore tokens if their IDs are within [0, vocab_size) (#3831)
|
2 years ago |
Kerfuffle
|
bd6d9e2059
llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747)
|
2 years ago |
Georgi Gerganov
|
ee1a0ec9cb
llama : add option for greedy sampling with probs (#3813)
|
2 years ago |
Henk Poley
|
177461104b
common : print that one line of the syntax help *also* to standard output (#3823)
|
2 years ago |
Georgi Gerganov
|
fdee152e4e
starcoder : add GPU offloading (#3827)
|
2 years ago |
Kerfuffle
|
41aee4df82
speculative : ensure draft and target model vocab matches (#3812)
|
2 years ago |
cebtenzzre
|
6d459cbfbe
llama : correctly report GGUFv3 format (#3818)
|
2 years ago |
Thibault Terrasson
|
c8d6a1f34a
simple : fix batch handling (#3803)
|
2 years ago |
Georgi Gerganov
|
2f9ec7e271
cuda : improve text-generation and batched decoding performance (#3776)
|
2 years ago |
Georgi Gerganov
|
34b2a5e1ee
server : do not release slot on image input (#3798)
|
2 years ago |
Georgi Gerganov
|
6961c4bd0b
batched-bench : print params at start
|
2 years ago |
Georgi Gerganov
|
cc44877486
log : disable pid in log filenames
|
2 years ago |
cebtenzzre
|
ad93962657
server : add parameter -tb N, --threads-batch N (#3584) (#3768)
|
2 years ago |
Georgi Gerganov
|
1717521cdb
server : do not block system prompt update (#3767)
|
2 years ago |
Georgi Gerganov
|
b2f7e04bd3
sync : ggml (conv ops + cuda MSVC fixes) (#3765)
|
2 years ago |
John Smith
|
abd21fc99f
cmake : add missed dependencies (#3763)
|
2 years ago |
Georgi Gerganov
|
2b4ea35e56
cuda : add batched cuBLAS GEMM for faster attention (#3749)
|
2 years ago |
Galunid
|
daab3d7f45
Add more tokenizer tests (#3742)
|
2 years ago |
Georgi Gerganov
|
469c9addef
metal : handle ggml_scale for n%4 != 0 (close #3754)
|
2 years ago |
Georgi Gerganov
|
e3932593d4
Revert "make : add optional CUDA_NATIVE_ARCH (#2482)"
|
2 years ago |
M. Yusuf Sarıgöz
|
9d02956443
issues : separate bug and enhancement template + no default title (#3748)
|
2 years ago |
Galunid
|
69a6735087
Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746)
|
2 years ago |