Pierrick Hymbert
|
4bd0f93e4a
model: support arch `DbrxForCausalLM` (#6515)
|
1 year ago |
jiez
|
91c736015b
llama : add gguf_remove_key + remove split meta during quantize (#6591)
|
1 year ago |
MasterYi1024
|
dee7f8d692
Correct free memory and total memory. (#6630)
|
1 year ago |
Clint Herron
|
04a5ac211e
Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616)
|
1 year ago |
Olivier Chafik
|
cbaadc9294
grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609)
|
1 year ago |
Pierrick Hymbert
|
b804b1ef77
eval-callback: Example how to use eval callback for debugging (#6576)
|
1 year ago |
slaren
|
4f407a0a35
llama : add model types for mixtral (#6589)
|
1 year ago |
Jared Van Bortel
|
1b67731e18
BERT tokenizer fixes (#6498)
|
1 year ago |
Carolinabanana
|
5dc9dd7152
llama : add Command R Plus support (#6491)
|
1 year ago |
Georgi Gerganov
|
cc4a95426d
llama : fix attention layer count sanity check (#6550)
|
1 year ago |
Georgi Gerganov
|
b73e564b16
quantize : fix precedence of cli args (#6541)
|
1 year ago |
Rick G
|
e3c337d87c
llama : support negative ith in llama_get_ API (#6519)
|
1 year ago |
Jan Boon
|
beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
|
1 year ago |
Brian
|
a8bd14d557
gguf.py : add licence and version to gguf writer (#6504)
|
1 year ago |
Clint Herron
|
9b84ae1806
examples : add GBNF validator program (#5948)
|
1 year ago |
bryanSwk
|
bb43cf7e9d
llama : add SEA-LION support (#6448)
|
1 year ago |
kaizau
|
1ff4d9f3d6
Add OpenChat, Alpaca, Vicuna chat templates (#6397)
|
1 year ago |
slaren
|
08a0c02060
ggml : mul_mat_id use the same tensor for all the experts (#6387)
|
1 year ago |
0cc4m
|
ba0c7c70ab
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
|
1 year ago |
hxer7963
|
069574775c
[Model] Add support for xverse (#6301)
|
1 year ago |
Daniel Bevenius
|
057400a3fd
llama : remove redundant reshape in build_kv_store (#6369)
|
1 year ago |
compilade
|
0308f5e3d7
llama : fix command-r inference when omitting outputs (#6367)
|
1 year ago |
Jared Van Bortel
|
32c8486e1f
wpm : portable unicode tolower (#6305)
|
1 year ago |
compilade
|
557410b8f0
llama : greatly reduce output buffer memory usage (#6122)
|
1 year ago |
Kawrakow
|
55c1b2a3bb
IQ1_M: 1.75 bpw quantization (#6302)
|
1 year ago |
Kawrakow
|
d25b1c31b0
quantize : be able to override metadata by key (#6321)
|
1 year ago |
slaren
|
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299)
|
1 year ago |
Meng, Hengyu
|
ddf6568510
[SYCL] offload op (#6217)
|
1 year ago |
Jared Van Bortel
|
94d1b3b411
use _wfopen instead of fopen on Windows (#6248)
|
1 year ago |
Pierrick Hymbert
|
f482bb2e49
common: llama_load_model_from_url split support (#6192)
|
1 year ago |