Xuan Son Nguyen
|
7554aa4655
convert-lora : make `--base` optional (#10110)
|
1 year ago |
Diego Devesa
|
a6744e43e8
llama : add simple-chat example (#10124)
|
1 year ago |
Diego Devesa
|
e991e3127f
llama : use smart pointers for ggml resources (#10117)
|
1 year ago |
Shupei Fan
|
418f5eef26
vulkan : improve ggml_vk_create_buffer error handling (#9898)
|
1 year ago |
Georgi Gerganov
|
ba6f62eb79
readme : update hot topics
|
1 year ago |
sasha0552
|
d865d1478c
server : fix smart selection of available slot (#10120)
|
1 year ago |
Georgi Gerganov
|
1804adb0cf
ggml : remove ggml_scratch (#10121)
|
1 year ago |
Georgi Gerganov
|
815fe72adc
sync : ggml
|
1 year ago |
Georgi Gerganov
|
f221d56220
ggml : alloc ggml_contexts on the heap (whisper/2525)
|
1 year ago |
Zhenwei Jin
|
e597e50794
build: fix build error in Windows env with OneAPI setup (#10107)
|
1 year ago |
Diego Devesa
|
85679d37f3
llama : improve output buffer type selection (#10098)
|
1 year ago |
Diego Devesa
|
1e9f94994e
quantize : fix --keep-split (#10114)
|
1 year ago |
Diego Devesa
|
c02e5ab2a6
llama : fix buffer checks for mamba and rwk (#10111)
|
1 year ago |
Zhenwei Jin
|
ab3d71f97f
loader: refactor tensor weights storage (#9935)
|
1 year ago |
Kevin Gibbons
|
0a683e8088
server : include scheme when printing URL (#10106)
|
1 year ago |
Diego Devesa
|
dea5e86051
ggml : check tensor name lengths in gguf files (#10100)
|
1 year ago |
Sergio López
|
1329c0a75e
kompute: add mul_mat_q4_k shader (#10097)
|
1 year ago |
Sergio López
|
61408e7fad
kompute: add backend registry / device interfaces (#10045)
|
1 year ago |
Diego Devesa
|
b9e02e8184
ggml : fix memory leaks when loading invalid gguf files (#10094)
|
1 year ago |
Rich Dougherty
|
6763f713bb
readme : more lora detail in main example readme (#10064)
|
1 year ago |
Rich Dougherty
|
79a2bc042d
convert : more detailed convert lora usage docs (#10065)
|
1 year ago |
xctan
|
fc83a9e584
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
|
1 year ago |
Diego Devesa
|
c5b0f4b5d9
llama : refactor model loader with backend registry (#10026)
|
1 year ago |
Changyeon Kim
|
8f275a7c45
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)
|
1 year ago |
Georgi Gerganov
|
8d8ff71536
llama : remove Tail-Free sampling (#10071)
|
1 year ago |
arch-btw
|
61715d5cc8
llama : Add IBM granite template (#10013)
|
1 year ago |
Georgi Gerganov
|
07028f9d74
flake.lock: Update (#10063)
|
1 year ago |
R0CKSTAR
|
524afeec9d
musa: workaround for Guilty Lockup in cleaning src0 (#10042)
|
1 year ago |
Georgi Gerganov
|
8125e6cbfc
server : don't overfill the batch during infill (#10018)
|
1 year ago |
Georgi Gerganov
|
8841ce3f43
llama : switch KQ multiplication to F32 precision by default (#10015)
|
1 year ago |