Johannes Gäßler
|
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
|
1 rok pred |
Georgi Gerganov
|
554c247caf
ggml : remove OpenCL (#7735)
|
1 rok pred |
Georgi Gerganov
|
0cd6bd3483
llama : remove beam search (#7736)
|
1 rok pred |
Radoslav Gerganov
|
bde7cd3cd9
llama : offload to RPC in addition to other backends (#7640)
|
1 rok pred |
Masaya, Kato
|
a5735e4426
ggml : use OpenMP as a thread pool (#7606)
|
1 rok pred |
Johannes Gäßler
|
0b832d53ba
make: fix debug options not being applied to NVCC (#7714)
|
1 rok pred |
Yazan Agha-Schrader
|
2e666832e6
server : new UI (#7633)
|
1 rok pred |
Johannes Gäßler
|
9b596417af
CUDA: quantized KV support for FA vec (#7527)
|
1 rok pred |
Daniele
|
30e238b246
Improve HIP compatibility (#7672)
|
1 rok pred |
Johannes Gäßler
|
10b1e45876
make: add --device-debug to NVCC debug flags (#7542)
|
1 rok pred |
Georgi Gerganov
|
e84b71c2c6
ggml : drop support for QK_K=64 (#7473)
|
1 rok pred |
junchao-loongson
|
65c58207ec
ggml : add loongarch lsx and lasx support (#6454)
|
1 rok pred |
slaren
|
d359f30921
llama : remove MPI backend (#7395)
|
1 rok pred |
Gavin Zhao
|
82ca83db3c
ROCm: use native CMake HIP support (#5966)
|
1 rok pred |
agray3
|
bc4bba364f
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
1 rok pred |
Georgi Gerganov
|
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
1 rok pred |
Georgi Gerganov
|
f4ab2a4147
llama : fix BPE pre-tokenization (#6920)
|
1 rok pred |
Przemysław Pawełczyk
|
577277ffd2
make : change GNU make default CXX from g++ to c++ (#6966)
|
1 rok pred |
Pierrick Hymbert
|
0c4d489e29
quantize: add imatrix and dataset metadata in GGUF (#6658)
|
1 rok pred |
Justine Tunney
|
192090bae4
llamafile : improve sgemm.cpp (#6796)
|
1 rok pred |
Olivier Chafik
|
5cf5e7d490
`build`: generate hex dump of server assets during build (#6661)
|
1 rok pred |
Georgi Gerganov
|
40f74e4d73
llama : add option to render special/control tokens (#6807)
|
1 rok pred |
Georgi Gerganov
|
3b8f1ec4b1
llamafile : tmp disable + build sgemm.o when needed (#6716)
|
1 rok pred |
Georgi Gerganov
|
666867b799
ggml : fix llamafile sgemm wdata offsets (#6710)
|
1 rok pred |
Justine Tunney
|
8cc91dc63c
ggml : add llamafile sgemm (#6414)
|
1 rok pred |
Olivier Chafik
|
7593639ce3
`main`: add --json-schema / -j flag (#6659)
|
1 rok pred |
Nikolas
|
a474f50ebb
Refactor Error Handling for CUDA (#6575)
|
1 rok pred |
Pierrick Hymbert
|
b804b1ef77
eval-callback: Example how to use eval callback for debugging (#6576)
|
1 rok pred |
Clint Herron
|
57dd02c44b
Tests: Added integration tests for GBNF parser (#6472)
|
1 rok pred |
Clint Herron
|
9b84ae1806
examples : add GBNF validator program (#5948)
|
1 rok pred |