Borislav Stanimirov
|
ff966e7ca6
build : fix several cast and printf warnings (#2499)
|
2 anos atrás |
Evan Jones
|
8183159cf3
examples : generate JSON according to schema (#1887)
|
2 anos atrás |
Johannes Gäßler
|
468ea24fb4
CUDA: faster non k-quant mul_mat_q kernels (#2483)
|
2 anos atrás |
Johannes Gäßler
|
4f6b60c776
CUDA: Fix models with output size != 32000 (#2480)
|
2 anos atrás |
ldwang
|
220d931864
readme : add Aquila-7B model series to supported models (#2487)
|
2 anos atrás |
Eve
|
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) (#2451)
|
2 anos atrás |
Yiming Cui
|
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
|
2 anos atrás |
Bono Lv
|
c574bddb36
fix a typo in examples/server/README.md (#2478)
|
2 anos atrás |
ebraminio
|
86aeb27734
server : Support dark mode (#2414)
|
2 anos atrás |
Matteo Boschini
|
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
|
2 anos atrás |
Johannes Gäßler
|
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option (#2473)
|
2 anos atrás |
Johannes Gäßler
|
b772bba42e
CUDA: fixed cmake F16 option (#2471)
|
2 anos atrás |
Johannes Gäßler
|
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453)
|
2 anos atrás |
Johannes Gäßler
|
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE (#2468)
|
2 anos atrás |
Johannes Gäßler
|
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q (#2458)
|
2 anos atrás |
slaren
|
9d2382b3e4
Fix Metal backend broken from the allocator changes (#2455)
|
2 anos atrás |
slaren
|
a113689571
ggml : add graph tensor allocator (#2411)
|
2 anos atrás |
Johannes Gäßler
|
11f3ca06b8
CUDA: Quantized matrix matrix multiplication (#2160)
|
2 anos atrás |
Johannes Gäßler
|
9baf9ef304
CUDA: faster multi GPU synchronization (#2448)
|
2 anos atrás |
klosax
|
8a88e5855c
perplexity : add Hellaswag calculation (#2389)
|
2 anos atrás |
Lee
|
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405)
|
2 anos atrás |
eric8607242
|
ee1b497c98
llama : support more diverse tokenizers? (#2420)
|
2 anos atrás |
Georgi Gerganov
|
d73b8d48b4
examples : fix whitespace
|
2 anos atrás |
nhamanasu
|
34ae1caf7f
examples : server chat mode with llama2 (#2400)
|
2 anos atrás |
Weird Constructor
|
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method (#2431)
|
2 anos atrás |
Rand Xie
|
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)
|
2 anos atrás |
niansa/tuxifan
|
edcc7ae7d2
Obtaining LLaMA 2 instructions (#2308)
|
2 anos atrás |
mj-shifu
|
7c529cede6
convert.py : Update to support 70B HF format model files (#2427)
|
2 anos atrás |
Georgi Gerganov
|
1a941869cb
metal : disable graph concurrency optimization due to bug (#2413)
|
2 anos atrás |
slaren
|
b5472ea0ad
ggml : fix assert in ggml_set_unary_op (#2410)
|
2 anos atrás |