Kawrakow
|
334a835a1c
ggml : importance matrix support for legacy quants (#4969)
|
2 years ago |
Maximilian Winter
|
4feb4b33ee
examples : add complete parallel function calling example (#4974)
|
2 years ago |
Georgi Gerganov
|
959ef0c0df
perplexity : fix kv cache handling for hellaswag (#4981)
|
2 years ago |
Georgi Gerganov
|
c37b3474e6
flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920)
|
2 years ago |
Paul Tsochantaris
|
158f8c9e21
metal : localized logic in `ggml_metal_graph_compute` (#4924)
|
2 years ago |
Neuman Vong
|
862f5e41ab
android : introduce starter project example (#4926)
|
2 years ago |
Alex Azarov
|
3a48d558a6
metal : replace loop of dispatch_async with dispatch_apply (#4934)
|
2 years ago |
Alex Azarov
|
7c8d3abd1a
metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (#4936)
|
2 years ago |
Maximilian Winter
|
122ed4840c
examples : fix and improv docs for the grammar generator (#4909)
|
2 years ago |
Justine Tunney
|
a0b3ac8c48
ggml : introduce GGML_CALL function annotation (#4850)
|
2 years ago |
Daniel Bevenius
|
d75c232e1d
finetune : use LLAMA_FILE_MAGIC_GGLA (#4961)
|
2 years ago |
stduhpf
|
e0324285a5
speculative : threading options (#4959)
|
2 years ago |
ngc92
|
3e5ca7931c
pass cpu-architecture arguments only to host code (C;C++) (#4943)
|
2 years ago |
David Friehs
|
4483396751
llama : apply classifier-free guidance to logits directly (#4951)
|
2 years ago |
Victor Z. Peng
|
d9aa4ffa6e
awq-py : fix typo in awq-py/README.md (#4947)
|
2 years ago |
Georgi Gerganov
|
ddb008d845
cuda : fix dequantize kernel names (#4938)
|
2 years ago |
Kawrakow
|
2faaef3979
llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950)
|
2 years ago |
Kawrakow
|
4a3156de2f
CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938)
|
2 years ago |
David Pflug
|
a836c8f534
llama : fix missing quotes (#4937)
|
2 years ago |
Kawrakow
|
467a882fd2
Add ability to use importance matrix for all k-quants (#4930)
|
2 years ago |
Georgi Gerganov
|
bb0c139247
llama : check LLAMA_TRACE env for extra logging (#4929)
|
2 years ago |
Georgi Gerganov
|
9408cfdad6
scripts : sync-ggml-am.sh option to skip commits
|
2 years ago |
Georgi Gerganov
|
03c5267490
llama : use LLAMA_LOG_ macros for logging
|
2 years ago |
Kawrakow
|
a128c38de8
Fix ffn_down quantization mix for MoE models (#4927)
|
2 years ago |
Alex Azarov
|
5f5fe1bd60
metal : correctly set SIMD support flags on iOS (#4923)
|
2 years ago |
Karthik Kumar Viswanathan
|
ac32902a87
llama : support WinXP build with MinGW 8.1.0 (#3419)
|
2 years ago |
Kawrakow
|
147b17ac94
2-bit quantizations (#4897)
|
2 years ago |
Kawrakow
|
807179ec58
Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)
|
2 years ago |
Georgi Gerganov
|
76484fbfd3
sync : ggml
|
2 years ago |
Johannes Gäßler
|
c71d608ce7
ggml: cache sin/cos for RoPE (#4908)
|
2 years ago |