Georgi Gerganov
|
ba69bbc84c
imatrix : offload to GPU support (#4957)
|
2 лет назад |
Georgi Gerganov
|
44a1a4a41a
backend : add eval callback (#4935)
|
2 лет назад |
Georgi Gerganov
|
c918fe8dca
metal : create autorelease pool during library build (#4970)
|
2 лет назад |
Georgi Gerganov
|
0f83e727af
py : fix whitespace
|
2 лет назад |
Georgi Gerganov
|
4f4bf35f46
py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971)
|
2 лет назад |
Kawrakow
|
2b3a665d39
llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996)
|
2 лет назад |
Paul Tsochantaris
|
7563293665
metal : remove unnecessary nil check (#4986)
|
2 лет назад |
David Renshaw
|
f46c0c1b0e
llama : fix copy/paste error in llama_sampling_params comment (#4994)
|
2 лет назад |
Georgi Gerganov
|
5c99960901
py : remove unnecessary hasattr (#4903)
|
2 лет назад |
Philip Taron
|
bee938da74
nix: remove nixConfig from flake.nix (#4984)
|
2 лет назад |
Daniel Bevenius
|
cec8a48470
finetune : add training data file to log message (#4979)
|
2 лет назад |
Kawrakow
|
334a835a1c
ggml : importance matrix support for legacy quants (#4969)
|
2 лет назад |
Maximilian Winter
|
4feb4b33ee
examples : add complete parallel function calling example (#4974)
|
2 лет назад |
Georgi Gerganov
|
959ef0c0df
perplexity : fix kv cache handling for hellaswag (#4981)
|
2 лет назад |
Georgi Gerganov
|
c37b3474e6
flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920)
|
2 лет назад |
Paul Tsochantaris
|
158f8c9e21
metal : localized logic in `ggml_metal_graph_compute` (#4924)
|
2 лет назад |
Neuman Vong
|
862f5e41ab
android : introduce starter project example (#4926)
|
2 лет назад |
Alex Azarov
|
3a48d558a6
metal : replace loop of dispatch_async with dispatch_apply (#4934)
|
2 лет назад |
Alex Azarov
|
7c8d3abd1a
metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (#4936)
|
2 лет назад |
Maximilian Winter
|
122ed4840c
examples : fix and improv docs for the grammar generator (#4909)
|
2 лет назад |
Justine Tunney
|
a0b3ac8c48
ggml : introduce GGML_CALL function annotation (#4850)
|
2 лет назад |
Daniel Bevenius
|
d75c232e1d
finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)
|
2 лет назад |
stduhpf
|
e0324285a5
speculative : threading options (#4959)
|
2 лет назад |
ngc92
|
3e5ca7931c
pass cpu-architecture arguments only to host code (C;C++) (#4943)
|
2 лет назад |
David Friehs
|
4483396751
llama : apply classifier-free guidance to logits directly (#4951)
|
2 лет назад |
Victor Z. Peng
|
d9aa4ffa6e
awq-py : fix typo in awq-py/README.md (#4947)
|
2 лет назад |
Georgi Gerganov
|
ddb008d845
cuda : fix dequantize kernel names (#4938)
|
2 лет назад |
Kawrakow
|
2faaef3979
llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)
|
2 лет назад |
Kawrakow
|
4a3156de2f
CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)
|
2 лет назад |
David Pflug
|
a836c8f534
llama : fix missing quotes (#4937)
|
2 лет назад |