Georgi Gerganov
|
540938f890
llama : llama_model_desc print number of experts
|
2 rokov pred |
Marcus Dunn
|
0040d42eeb
llama : replace all API facing `int`'s with `int32_t` (#4577)
|
2 rokov pred |
postmasters
|
83e633c27e
llama : differentiate the KV dims in the attention (#4657)
|
2 rokov pred |
automaticcat
|
24a447e20a
ggml : add ggml_cpu_has_avx_vnni() (#4589)
|
2 rokov pred |
manikbhandari
|
ea5497df5d
gpt2 : Add gpt2 architecture integration (#4555)
|
2 rokov pred |
Nam D. Tran
|
f6793491b5
llama : add AWQ for llama, llama2, mpt, and mistral models (#4593)
|
2 rokov pred |
slaren
|
dc68f0054c
cuda : fix vmm pool with multi GPU (#4620)
|
2 rokov pred |
Shintarou Okada
|
753be377b6
llama : add PLaMo model (#3557)
|
2 rokov pred |
slaren
|
5bf3953d7e
cuda : improve cuda pool efficiency using virtual memory (#4606)
|
2 rokov pred |
slaren
|
708e179e85
fallback to CPU buffer if host buffer alloc fails (#4610)
|
2 rokov pred |
slaren
|
48b7ff193e
llama : fix platforms without mmap (#4578)
|
2 rokov pred |
crasm
|
c7e9701f86
llama : add ability to cancel model loading (#4462)
|
2 rokov pred |
Georgi Gerganov
|
afefa319f1
ggml : change ggml_scale to take a float instead of tensor (#4573)
|
2 rokov pred |
slaren
|
d232aca5a7
llama : initial ggml-backend integration (#4520)
|
2 rokov pred |
Marcus Dunn
|
31f27758fa
llama : allow getting n_batch from llama_context in c api (#4540)
|
2 rokov pred |
Johannes Gäßler
|
d3223afdad
llama : disable per-tensor info prints on model load (#4562)
|
2 rokov pred |
Ebey Abraham
|
b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)
|
2 rokov pred |
hankcs
|
3c04bf6da8
llama : fix try_override for bool_value which always return true (#4519)
|
2 rokov pred |
Jared Van Bortel
|
2994f0c5a2
decode : fix logits_valid for legacy API (#4516)
|
2 rokov pred |
Georgi Gerganov
|
800a489e4a
llama.swiftui : add bench functionality (#4483)
|
2 rokov pred |
slaren
|
c6c4fc081c
lora : add support for non-llama models (#3333)
|
2 rokov pred |
Jared Van Bortel
|
8a5be3bd58
llama : sanity checks for access to logits (#4274)
|
2 rokov pred |
slaren
|
cafcd4f895
ggml : remove n_dims from ggml_tensor (#4469)
|
2 rokov pred |
LostRuins
|
20a68a7030
ggml : add ggml_row_size() (fixes llama out of space) (#4461)
|
2 rokov pred |
slaren
|
799a1cb13b
llama : add Mixtral support (#4406)
|
2 rokov pred |
Richard Kiss
|
9494d7c477
english : use `typos` to fix comments and logs (#4354)
|
2 rokov pred |
Xiang (Kevin) Li
|
e18f7345a3
grammar : revert the replacement of llama_token_to_piece with id_to_token (#4396)
|
2 rokov pred |
Georgi Gerganov
|
bcc0eb4591
llama : per-layer KV cache + quantum K cache (#4309)
|
2 rokov pred |
Marcus Dunn
|
5f6e0c0dff
grammar : pre-computed pieces + reserve mem + less string copies (#4330)
|
2 rokov pred |
Kerfuffle
|
5aa365d88f
llama : allow overriding GGUF metadata when loading model (#4092)
|
2 rokov pred |