Johannes Gäßler
|
28103f4832
Server: fix seed for multiple slots (#6835)
|
1 year ago |
Tristan Druyen
|
abd3314064
llama : add phi 3 chat template (#6857)
|
1 year ago |
liuwei-git
|
c8297c6af5
llama : add phi3 support (#6852)
|
1 year ago |
Georgi Gerganov
|
8960fe86ae
llama : fix typo in <|im_end|> token text (#6745)
|
1 year ago |
Georgi Gerganov
|
40f74e4d73
llama : add option to render special/control tokens (#6807)
|
1 year ago |
Wouter
|
7dbdba5690
llama : add llama-3 chat template (#6751)
|
1 year ago |
Pedro Cuenca
|
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
|
1 year ago |
nopperl
|
9958c81b79
Implement the OLMo architecture (#6741)
|
1 year ago |
slaren
|
0d56246f4b
ggml : group all experts in a single ggml_mul_mat_id (#6505)
|
1 year ago |
Ren Xuancheng
|
e11b2e6e1e
Qwen2 : assume tied weights if lm_head/output weights is missing (#6738)
|
1 year ago |
slaren
|
c71bfd736e
llama : fix compatibility with old 2 expert models (#6735)
|
1 year ago |
Georgi Gerganov
|
532c1737a1
llama : make general.name optional (#6709)
|
1 year ago |
Ashish
|
dbceec87c0
llama : add StableLM2 12B (#6635)
|
1 year ago |
Shijie
|
f4dea7da18
llama : add qwen2moe (#6074)
|
1 year ago |
Daniel Bevenius
|
4fbd8098e6
gguf : add special tokens metadata for FIM/Infill (#6689)
|
1 year ago |
compilade
|
132f55795e
llama : fix restoring the number of outputs from state files (#6687)
|
1 year ago |
David Renshaw
|
1958f7e06c
llama : add missing kv clear in llama_beam_search (#6664)
|
1 year ago |
Chao Jiang
|
04fbc5f23e
Add Command R chat template (#6650)
|
1 year ago |
Pierrick Hymbert
|
4bd0f93e4a
model: support arch `DbrxForCausalLM` (#6515)
|
1 year ago |
jiez
|
91c736015b
llama : add gguf_remove_key + remove split meta during quantize (#6591)
|
1 year ago |
MasterYi1024
|
dee7f8d692
Correct free memory and total memory. (#6630)
|
1 year ago |
Clint Herron
|
04a5ac211e
Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616)
|
1 year ago |
Olivier Chafik
|
cbaadc9294
grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609)
|
1 year ago |
Pierrick Hymbert
|
b804b1ef77
eval-callback: Example how to use eval callback for debugging (#6576)
|
1 year ago |
slaren
|
4f407a0a35
llama : add model types for mixtral (#6589)
|
1 year ago |
Jared Van Bortel
|
1b67731e18
BERT tokenizer fixes (#6498)
|
1 year ago |
Carolinabanana
|
5dc9dd7152
llama : add Command R Plus support (#6491)
|
1 year ago |
Georgi Gerganov
|
cc4a95426d
llama : fix attention layer count sanity check (#6550)
|
1 year ago |
Georgi Gerganov
|
b73e564b16
quantize : fix precedence of cli args (#6541)
|
1 year ago |
Rick G
|
e3c337d87c
llama : support negative ith in llama_get_ API (#6519)
|
1 year ago |