cturan/llama.cpp

Author	SHA1 Message	Date
Georgi Gerganov	56a00f0a2f common : default --hf-file to --model (#6234)	1 year ago
fraxy-v	92397d87a4 convert-llama2c-to-ggml : enable conversion of GQA models (#6237)	1 year ago
Kawrakow	1d0331c12a quantize: options for output and token embedding tensors qtype (#6239)	1 year ago
Pierrick Hymbert	dba1af6129 llama_model_loader: support multiple split/shard GGUFs (#6187)	1 year ago
Minsoo Cheong	ee804f6223 ci: apply concurrency limit for github workflows (#6243)	1 year ago
Georgi Gerganov	80bd33bc2c common : add HF arg helpers (#6234)	1 year ago
Nexesenex	e80f06d2a1 llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)	1 year ago
Olivier Chafik	f77a8ffd3b tests : conditional python & node json schema tests (#6207)	1 year ago
Olivier Chafik	72114edf06 json-schema-to-grammar : fix order of props + non-str const/enum (#6232)	1 year ago
slaren	2f0e81e053 cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)	1 year ago
Xiaoyi Chen	29ab270e65 readme : add RecurseChat to the list of UIs (#6219)	1 year ago
Jan Boon	6b8bb3a31d server : fix n_keep always showing as 0 in response (#6211)	1 year ago
Georgi Gerganov	68e210b354 server : enable continuous batching by default (#6231)	1 year ago
Georgi Gerganov	b3e94f26ba metal : proper assert for mat-mat memory alignment (#6225)	1 year ago
Vaibhav Srivastav	b2075fd6a5 ci : add CURL flag for the mac builds (#6214)	1 year ago
Georgi Gerganov	95d576b48e metal : pad n_ctx by 32 (#6177)	1 year ago
Neo Zhang Jianyu	59c17f02de add blog link (#6222)	1 year ago
DAN™	fa046eafbc Fix params underscore convert to dash. (#6203)	1 year ago
Jan Boon	be07a03217 server : update readme doc from `slot_id` to `id_slot` (#6213)	1 year ago
slaren	d0a71233fb cuda : disable host register by default (#6206)	1 year ago
semidark	f372c49ccd Corrected typo to wrong file (#6199)	1 year ago
Georgi Gerganov	924ce1dce7 tests : disable system() calls (#6198)	1 year ago
slaren	03a8f8fafe cuda : fix LLAMA_CUDA_F16 build (#6197)	1 year ago
Kawrakow	cfd3be76e3 ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)	1 year ago
Olivier Chafik	5b7b0ac8df json-schema-to-grammar improvements (+ added to server) (#5978)	1 year ago
Vaibhav Srivastav	1943c01981 ci : fix indentation error (#6195)	1 year ago
Vaibhav Srivastav	5e43ba8742 build : add mac pre-build binaries (#6182)	1 year ago
Kawrakow	76aa30a263 Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)	1 year ago
AidanBeltonS	c5b8595e3f Add nvidia and amd backends (#6157)	1 year ago
slaren	42e21c6882 cuda : fix conflict with std::swap (#6186)	1 year ago

Newer Older

Commit History Find

Commit History