comex
|
2663d2c678
Windows fixes (#890)
|
2 years ago |
comex
|
180b693a47
Print model version.
|
2 years ago |
comex
|
f963b63afa
Rewrite loading code to try to satisfy everyone:
|
2 years ago |
unbounded
|
62cfc54f77
Add quantize-stats command for testing quantization (#728)
|
2 years ago |
Ivan Stepanov
|
4953e9007f
llama : always sort logits before nucleus sampling (#812)
|
2 years ago |
Georgi Gerganov
|
986b6ce9f9
ggml, llama : avoid heavy V transpose + improvements (#775)
|
2 years ago |
Ivan Stepanov
|
5a8c4f6240
llama : define non-positive top_k; top_k range check (#779)
|
2 years ago |
Ivan Stepanov
|
cd7fa95690
Define non-positive temperature behavior (#720)
|
2 years ago |
Christian Falch
|
e986f94829
Added api for getting/setting the kv_cache (#685)
|
2 years ago |
Marian Cepok
|
c0bb1d3ce2
ggml : change ne to int64_t (#626)
|
2 years ago |
Stephan Walter
|
81040f10aa
llama : do not allocate KV cache for "vocab_only == true" (#682)
|
2 years ago |
Justine Tunney
|
ee0c40dd6d
Introduce GGML migration tool for new file format
|
2 years ago |
Justine Tunney
|
6f23ba5ee2
Ensure --mlock works properly with mmap() support
|
2 years ago |
Justine Tunney
|
78ca9838ee
Make loading weights 10-100x faster
|
2 years ago |
Slaren
|
a017390358
Initial windows support (untested)
|
2 years ago |
Slaren
|
ac184d5147
Always initialize mm_addr and mm_length in llama_model
|
2 years ago |
Slaren
|
276e5b7811
Unmap the file in llama_free
|
2 years ago |
Slaren
|
d68c5dc435
Make mmap_file static
|
2 years ago |
Slaren
|
64bde3ffd4
Fix ggml_init_params in quantize
|
2 years ago |
Slaren
|
c03ae8dca1
Add mmap support for model files
|
2 years ago |
Georgi Gerganov
|
0ba76c1e73
llama : fix compile warnings when reading the vocab
|
2 years ago |
Maël Kerbiriou
|
41318d708e
llama : use the same threshold for OpenBLAS and ggml thread limiting (#577)
|
2 years ago |
thement
|
d0aaff571c
py : add temporary script to convert old ggml files to newer version (#539)
|
2 years ago |
Stephan Walter
|
436e561931
all : be more strict about converting float to double (#458)
|
2 years ago |
Stephan Walter
|
c1f885067c
ggml : introduce structs for the q4 data blocks (#356)
|
2 years ago |
Georgi Gerganov
|
03f7e33560
Cleanup STL headers + fix embedding examples + minor stuff
|
2 years ago |
Georgi Gerganov
|
4640eff23d
Don't interefe with BLAS for large prompts by running only 1 thread
|
2 years ago |
slaren
|
29b7baab67
Add timings for the prompt evaluation (#478)
|
2 years ago |
Georgi Gerganov
|
2a2e63ce05
Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS
|
2 years ago |
Jed Fox
|
58e6c9f36f
Add support for file load progress reporting callbacks (#434)
|
2 years ago |