cturan/llama.cpp

Author	SHA1 Message	Date
Xuan Son Nguyen	7554aa4655 convert-lora : make `--base` optional (#10110)	1 year ago
Diego Devesa	a6744e43e8 llama : add simple-chat example (#10124)	1 year ago
Diego Devesa	e991e3127f llama : use smart pointers for ggml resources (#10117)	1 year ago
Shupei Fan	418f5eef26 vulkan : improve ggml_vk_create_buffer error handling (#9898)	1 year ago
Georgi Gerganov	ba6f62eb79 readme : update hot topics	1 year ago
sasha0552	d865d1478c server : fix smart selection of available slot (#10120)	1 year ago
Georgi Gerganov	1804adb0cf ggml : remove ggml_scratch (#10121)	1 year ago
Georgi Gerganov	815fe72adc sync : ggml	1 year ago
Georgi Gerganov	f221d56220 ggml : alloc ggml_contexts on the heap (whisper/2525)	1 year ago
Zhenwei Jin	e597e50794 build: fix build error in Windows env with OneAPI setup (#10107)	1 year ago
Diego Devesa	85679d37f3 llama : improve output buffer type selection (#10098)	1 year ago
Diego Devesa	1e9f94994e quantize : fix --keep-split (#10114)	1 year ago
Diego Devesa	c02e5ab2a6 llama : fix buffer checks for mamba and rwk (#10111)	1 year ago
Zhenwei Jin	ab3d71f97f loader: refactor tensor weights storage (#9935)	1 year ago
Kevin Gibbons	0a683e8088 server : include scheme when printing URL (#10106)	1 year ago
Diego Devesa	dea5e86051 ggml : check tensor name lengths in gguf files (#10100)	1 year ago
Sergio López	1329c0a75e kompute: add mul_mat_q4_k shader (#10097)	1 year ago
Sergio López	61408e7fad kompute: add backend registry / device interfaces (#10045)	1 year ago
Diego Devesa	b9e02e8184 ggml : fix memory leaks when loading invalid gguf files (#10094)	1 year ago
Rich Dougherty	6763f713bb readme : more lora detail in main example readme (#10064)	1 year ago
Rich Dougherty	79a2bc042d convert : more detailed convert lora usage docs (#10065)	1 year ago
xctan	fc83a9e584 ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)	1 year ago
Diego Devesa	c5b0f4b5d9 llama : refactor model loader with backend registry (#10026)	1 year ago
Changyeon Kim	8f275a7c45 ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)	1 year ago
Georgi Gerganov	8d8ff71536 llama : remove Tail-Free sampling (#10071)	1 year ago
arch-btw	61715d5cc8 llama : Add IBM granite template (#10013)	1 year ago
Georgi Gerganov	07028f9d74 flake.lock: Update (#10063)	1 year ago
R0CKSTAR	524afeec9d musa: workaround for Guilty Lockup in cleaning src0 (#10042)	1 year ago
Georgi Gerganov	8125e6cbfc server : don't overfill the batch during infill (#10018)	1 year ago
Georgi Gerganov	8841ce3f43 llama : switch KQ multiplication to F32 precision by default (#10015)	1 year ago

Newer Older

Commit History Find

Commit History