Georgi Gerganov
|
b0f27361f3
sampling : avoid expensive softmax during greedy sampling (#9605)
|
hai 1 ano |
Michael Podvitskiy
|
37f3a3810e
llama : add llama_n_head() (#9512)
|
hai 1 ano |
Georgi Gerganov
|
0abc6a2c25
llama : llama_perf + option to disable timings during decode (#9355)
|
hai 1 ano |
Gilad S.
|
bd35cb0ae3
feat: remove a sampler from a chain (#9445)
|
hai 1 ano |
slaren
|
49006c67b4
llama : move random seed generation to the samplers (#9398)
|
hai 1 ano |
slaren
|
5fb5e24811
llama : minor sampling refactor (2) (#9386)
|
hai 1 ano |
Georgi Gerganov
|
df270ef745
llama : refactor sampling v2 (#9294)
|
hai 1 ano |
compilade
|
9bc6db28d0
ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)
|
hai 1 ano |
Molly Sophia
|
8f1d81a0b6
llama : support RWKV v6 models (#8980)
|
hai 1 ano |
Sutou Kouhei
|
0ab30f8d82
llama : fix llama_split_mode enum values in main_gpu document (#9057)
|
hai 1 ano |
Faisal Zaghloul
|
42c76d1358
Threadpool: take 2 (#8672)
|
hai 1 ano |
compilade
|
a1631e53f6
llama : simplify Mamba with advanced batch splits (#8526)
|
hai 1 ano |
Minsoo Cheong
|
c679e0cb5c
llama : add EXAONE model support (#9025)
|
hai 1 ano |
Zhenwei Jin
|
4af8420afb
common : remove duplicate function llama_should_add_bos_token (#8778)
|
hai 1 ano |
Esko Toivonen
|
6bda7ce6c3
llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (#8850)
|
hai 1 ano |
Daniel Bevenius
|
06943a69f6
ggml : move rope type enum to ggml.h (#8949)
|
hai 1 ano |
fairydreaming
|
7c3f55c100
Add support for encoder-only T5 models (#8900)
|
hai 1 ano |
Nexes the Old
|
31958546c3
typo correction (#8891)
|
hai 1 ano |
compilade
|
4c676c85e5
llama : refactor session file management (#8699)
|
hai 1 ano |
Xuan Son Nguyen
|
b115105f05
add llama_lora_adapter_clear (#8653)
|
hai 1 ano |
Georgi Gerganov
|
938943cdbf
llama : move vocab, grammar and sampling into separate files (#8508)
|
hai 1 ano |
Keke Han
|
081fe431aa
llama : fix codeshell support (#8599)
|
hai 1 ano |
Jason Stillerman
|
d94c6e0ccb
llama : add support for SmolLm pre-tokenizer (#8609)
|
hai 1 ano |
Michael Coppola
|
940362224d
llama : add support for Tekken pre-tokenizer (#8579)
|
hai 1 ano |
Georgi Gerganov
|
d197545530
llama : bump max layers from 256 to 512 (#8530)
|
hai 1 ano |
Georgi Gerganov
|
0efec57787
llama : valign + remove unused ftype (#8502)
|
hai 1 ano |
Xuan Son Nguyen
|
97bdd26eee
Refactor lora adapter support (#8332)
|
hai 1 ano |
Dibakar Gope
|
0f1a39f343
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
|
hai 1 ano |
toyer
|
905942abdb
llama : support glm3 and glm4 (#8031)
|
hai 1 ano |
jaime-m-p
|
213701b51a
Detokenizer fixes (#8039)
|
hai 1 ano |