cturan/llama.cpp

Author	SHA1 Message	Date
Xuan Son Nguyen	afbbfaa537 server : add more env vars, improve gen-docs (#9635)	1 year ago
Gabe Goodhart	3d6bf6919f llama : add IBM Granite MoE architecture (#9438)	1 year ago
Dou Xinpeng	904837e0cb cann: fix crash when llama-bench is running on multiple cann devices (#9627)	1 year ago
Eric Zhang	70392f1f81 ggml : add AVX512DQ requirement for AVX512 builds (#9622)	1 year ago
Georgi Gerganov	bb5f819975 sync : ggml	1 year ago
Georgi Gerganov	c038931615 examples : adapt to ggml.h changes (ggml/0)	1 year ago
Georgi Gerganov	31ac5834fe llama : keep track of all EOG tokens in the vocab (#9609)	1 year ago
Georgi Gerganov	cea1486ecf log : add CONT level for continuing previous log entry (#9610)	1 year ago
StrangeBytesDev	0aa15011e3 server : add newline after chat example (#9616)	1 year ago
Georgi Gerganov	b0f27361f3 sampling : avoid expensive softmax during greedy sampling (#9605)	1 year ago
Max Krasnyansky	c087b6f11d threads: fix msvc build without openmp (#9615)	1 year ago
Ivan	116efee0ee cuda: add q8_0->f32 cpy operation (#9571)	1 year ago
Xuan Son Nguyen	0b3bf966f4 server : add --no-context-shift option (#9607)	1 year ago
Max Krasnyansky	f0c7b5edf8 threads: improve ggml_barrier scaling with large number of threads (#9598)	1 year ago
Riceball LEE	1d48e98e4f readme : add programmable prompt engine language CLI (#9599)	1 year ago
Georgi Gerganov	f3979df762 flake.lock: Update (#9586)	1 year ago
Srihari-mcw	1e7b9299c6 ggml : AVX512 gemm for Q4_0_8_8 (#9532)	1 year ago
Georgi Gerganov	37f8c7b4c9 perplexity : remove extra new lines after chunks (#9596)	1 year ago
Georgi Gerganov	bf9c1013ac metal : use F32 prec for K*Q in vec FA (#9595)	1 year ago
Akarshan Biswas	e62e9789cd Revert "[SYCL] fallback mmvq (#9088)" (#9579)	1 year ago
R0CKSTAR	c35e586ea5 musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526)	1 year ago
Molly Sophia	912c331d3d Fix merge error in #9454 (#9589)	1 year ago
Johannes Gäßler	a5b57b08ce CUDA: enable Gemma FA for HIP/Pascal (#9581)	1 year ago
Shankar	ecd5d6b65b llama: remove redundant loop when constructing ubatch (#9574)	1 year ago
Molly Sophia	2a63caaa69 RWKV v6: RWKV_WKV op CUDA implementation (#9454)	1 year ago
slaren	d09770cae7 ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573)	1 year ago
agray3	41f477879f Update CUDA graph on scale change plus clear nodes/params (#9550)	1 year ago
Huang Qi	e948a7da7a CI: Provide prebuilt windows binary for hip (#9467)	1 year ago
slaren	63351143b2 quantize : improve type name parsing (#9570)	1 year ago
Georgi Gerganov	d13edb17ed ggml : fix builds (#0)	1 year ago

Newer Older

Commit History Find

Commit History