cturan/llama.cpp

Autor	SHA1 Mensaxe	Data
Michael Podvitskiy	3202361c5b ggml, ci : Windows ARM runner and build fixes (#5979)	hai 1 ano
Georgi Gerganov	ee35600b90 llama : fix F16/F32 downcast + improve names (#5980)	hai 1 ano
DAN™	bcebd7dbf6 llama : add support for GritLM (#5959)	hai 1 ano
slaren	d894f352bf perplexity : support using multiple sequences to allow larger batch sizes (#5946)	hai 1 ano
Georgi Gerganov	5b09797321 ggml : remove old quantization functions (#5942)	hai 1 ano
compilade	c2101a2e90 llama : support Mamba Selective State Space Models (#5328)	hai 1 ano
compilade	515f7d0d4f llama : fix quantization of shared token_embd (#5944)	hai 1 ano
Don Mahurin	e457fb3540 llama : assume tied weights if lm_head/output weights is missing (#5824)	hai 1 ano
Neo Zhang Jianyu	89fb735fcf Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918)	hai 1 ano
Georgi Gerganov	2002bc96bf server : refactor (#5882)	hai 1 ano
Neo Zhang Jianyu	ceca1aef07 [SYCL] fix error when set main gpu to non-zero (#5901)	hai 1 ano
0cc4m	61d1c88e15 Vulkan Improvements (#5835)	hai 1 ano
Georgi Gerganov	29ae62d2ae llama : fix embeddings (#5796)	hai 1 ano
Xuan Son Nguyen	4ffcdce2ff add alias for chat template (#5858)	hai 1 ano
Douglas Hanley	475df1d6cf llama : allow for user specified embedding pooling type (#5849)	hai 1 ano
compilade	de9692a7d2 llama : fix llama_copy_state_data with fragmented KV cache (#5840)	hai 1 ano
Michael Podvitskiy	4a6e2d6142 llama : add abort_callback to interrupt computation (#5409)	hai 1 ano
Xuan Son Nguyen	6c32d8c7ad llama : refactor internal quantization functions (#5830)	hai 1 ano
compilade	802da0091b llama : fix segfault from unknown model arch name (#5820)	hai 1 ano
Neo Zhang Jianyu	715641391d Support multiple GPUs (split mode) on SYCL backend (#5806)	hai 1 ano
Sourab Mangrulkar	c29af7e225 llama : add StarCoder2 support (#5795)	hai 1 ano
Pierrick Hymbert	3ab8b3a92e llama : cleanup unused mmq flags (#5772)	hai 1 ano
Douglas Hanley	9600d59e01 unicode : switch to multimap based nfd_map (#5799)	hai 1 ano
Marcus Dunn	d5ab29757e llama : constified `llama_set_state_data`'s `src` (#5774)	hai 1 ano
Georgi Gerganov	08c5ee87e4 llama : remove deprecated API (#5770)	hai 1 ano
compilade	adcb12a9ba llama : fix non-quantization of expert gating tensors (#5754)	hai 1 ano
Douglas Hanley	177628bfd8 llama : improve BERT tokenization (#5740)	hai 1 ano
Kawrakow	0becb22ac0 IQ4_XS: a 4.25 bpw quantization (#5747)	hai 1 ano
Georgi Gerganov	9d533a77d0 llama : fix defrag bugs + add parameter (#5735)	hai 1 ano
Kawrakow	a33e6a0d2a Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721)	hai 1 ano

Posterior Anterior

Commit History Buscar

Commit History