Kawrakow
|
bac66994cf
Quantization imrovements for k_quants (#2707)
|
2 years ago |
slaren
|
1123f7fbdf
ggml-cuda : use graph allocator (#2684)
|
2 years ago |
Georgi Gerganov
|
6381d4e110
gguf : new file format with flexible meta data (beta) (#2398)
|
2 years ago |
slaren
|
097e121e2f
llama : add benchmark example (#2626)
|
2 years ago |
Evan Jones
|
604b8bdfa6
Fix unicode in grammars (fixes #2501) (#2553)
|
2 years ago |
Georgi Gerganov
|
a73ccf1aa3
llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)
|
2 years ago |
Shouzheng Liu
|
fc8ef549e5
metal : enable ggml-alloc (#2627)
|
2 years ago |
Shouzheng Liu
|
bf83bff674
metal : matrix-matrix multiplication kernel (#2615)
|
2 years ago |
Jhen-Jie Hong
|
d783f7982e
metal : return null instead of exit(1) (#2573)
|
2 years ago |
grahameth
|
ea04a4ca19
add log_callback to llama_context_params for custom logging. (#2234)
|
2 years ago |
Johannes Gäßler
|
acfc5478ff
CUDA: tighter VRAM scratch size for 65b/70b (#2551)
|
2 years ago |
Johannes Gäßler
|
3d9a551816
Fixed mmap prefetch for GPU offloading (#2529)
|
2 years ago |
l3utterfly
|
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
|
2 years ago |
Johannes Gäßler
|
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453)
|
2 years ago |
slaren
|
9d2382b3e4
Fix Metal backend broken from the allocator changes (#2455)
|
2 years ago |
slaren
|
a113689571
ggml : add graph tensor allocator (#2411)
|
2 years ago |
eric8607242
|
ee1b497c98
llama : support more diverse tokenizers? (#2420)
|
2 years ago |
Rand Xie
|
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)
|
2 years ago |
Georgi Gerganov
|
1a941869cb
metal : disable graph concurrency optimization due to bug (#2413)
|
2 years ago |
slaren
|
5488fb789e
ggml : allocate graphs in a context (#2392)
|
2 years ago |
Kawrakow
|
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)
|
2 years ago |
slaren
|
da1889834a
ggml : improve graph build time via hash table lookup (#2329)
|
2 years ago |
Shouzheng Liu
|
1aa18ef994
metal : concurrently dispatch commands (#2358)
|
2 years ago |
slaren
|
41c674161f
make rms_norm_eps a parameter (#2374)
|
2 years ago |
Evan Jones
|
84e09a7d8b
llama : add grammar-based sampling (#1773)
|
2 years ago |
Georgi Gerganov
|
e76d630df1
llama : grouped-query attention + LLaMAv2 70B support (#2276)
|
2 years ago |
Christian Demsar
|
a940458e48
llama : print max tensor size to stderr (#2336)
|
2 years ago |
Georgi Gerganov
|
b47b8a9cfe
llama : optimize memory buffers (#2325)
|
2 years ago |
Georgi Gerganov
|
513f861953
ggml : fix rope args order + assert (#2054)
|
2 years ago |
Guillaume "Vermeille" Sanchez
|
ab0e26bdfb
llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280)
|
2 years ago |