Brian
|
f7cab35ef9
gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048)
|
il y a 1 an |
Clint Herron
|
3e2618bc7b
Adding step to `clean` target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257)
|
il y a 1 an |
Xuan Son Nguyen
|
a27aa50ab7
Add missing items in makefile (#8177)
|
il y a 1 an |
slaren
|
c7ab7b612c
make : fix missing -O3 (#8143)
|
il y a 1 an |
Georgi Gerganov
|
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
|
il y a 1 an |
Johannes Gäßler
|
a818f3028d
CUDA: use MMQ instead of cuBLAS by default (#8075)
|
il y a 1 an |
slaren
|
95f57bb5d5
ggml : remove ggml_task_type and GGML_PERF (#8017)
|
il y a 1 an |
Clint Herron
|
c5a8d4b749
JSON Schema to GBNF integration tests (#7790)
|
il y a 1 an |
Ulrich Drepper
|
61665277af
Allow compiling with CUDA without CUDA runtime installed (#7989)
|
il y a 1 an |
0cc4m
|
7c7836d9d4
Vulkan Shader Refactor, Memory Debugging Option (#7947)
|
il y a 1 an |
Xuan Son Nguyen
|
0c7b3595b9
Add `cvector-generator` example (#7514)
|
il y a 1 an |
slaren
|
f578b86b21
move BLAS to a separate backend (#6210)
|
il y a 1 an |
Olivier Chafik
|
1c641e6aac
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
|
il y a 1 an |
Johannes Gäßler
|
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
|
il y a 1 an |
Georgi Gerganov
|
554c247caf
ggml : remove OpenCL (#7735)
|
il y a 1 an |
Georgi Gerganov
|
0cd6bd3483
llama : remove beam search (#7736)
|
il y a 1 an |
Radoslav Gerganov
|
bde7cd3cd9
llama : offload to RPC in addition to other backends (#7640)
|
il y a 1 an |
Masaya, Kato
|
a5735e4426
ggml : use OpenMP as a thread pool (#7606)
|
il y a 1 an |
Johannes Gäßler
|
0b832d53ba
make: fix debug options not being applied to NVCC (#7714)
|
il y a 1 an |
Yazan Agha-Schrader
|
2e666832e6
server : new UI (#7633)
|
il y a 1 an |
Johannes Gäßler
|
9b596417af
CUDA: quantized KV support for FA vec (#7527)
|
il y a 1 an |
Daniele
|
30e238b246
Improve HIP compatibility (#7672)
|
il y a 1 an |
Johannes Gäßler
|
10b1e45876
make: add --device-debug to NVCC debug flags (#7542)
|
il y a 1 an |
Georgi Gerganov
|
e84b71c2c6
ggml : drop support for QK_K=64 (#7473)
|
il y a 1 an |
junchao-loongson
|
65c58207ec
ggml : add loongarch lsx and lasx support (#6454)
|
il y a 1 an |
slaren
|
d359f30921
llama : remove MPI backend (#7395)
|
il y a 1 an |
Gavin Zhao
|
82ca83db3c
ROCm: use native CMake HIP support (#5966)
|
il y a 1 an |
agray3
|
bc4bba364f
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
il y a 1 an |
Georgi Gerganov
|
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
|
il y a 1 an |
Georgi Gerganov
|
f4ab2a4147
llama : fix BPE pre-tokenization (#6920)
|
il y a 1 an |