slaren
|
5bf3953d7e
cuda : improve cuda pool efficiency using virtual memory (#4606)
|
2 years ago |
slaren
|
708e179e85
fallback to CPU buffer if host buffer alloc fails (#4610)
|
2 years ago |
Samuel Maynard
|
925e5584a0
ci(docker): fix tags in "Build and push docker image (tagged)" (#4603)
|
2 years ago |
Alexey Parfenov
|
6123979952
server : allow to specify custom prompt for penalty calculation (#3727)
|
2 years ago |
kalomaze
|
b9ec82d262
grammar : check the full vocab only if necessary (opt) (#4306)
|
2 years ago |
Johannes Gäßler
|
e0a4002273
CUDA: fixed row rounding for 0 tensor splits (#4594)
|
2 years ago |
LeonEricsson
|
7082d24cec
lookup : add prompt lookup decoding example (#4484)
|
2 years ago |
Georgi Gerganov
|
ba66175132
sync : ggml (fix im2col) (#4591)
|
2 years ago |
FantasyGmm
|
a55876955b
cuda : fix jetson compile error (#4560)
|
2 years ago |
Henrik Forstén
|
6724ef1657
Fix CudaMemcpy direction (#4599)
|
2 years ago |
slaren
|
48b7ff193e
llama : fix platforms without mmap (#4578)
|
2 years ago |
Herman Semenov
|
48b24b170e
ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203)
|
2 years ago |
Michael Kesper
|
28cb35a0ec
make : add LLAMA_HIP_UMA option (#4587)
|
2 years ago |
rhuddleston
|
f31b984898
ci : tag docker image with build number (#4584)
|
2 years ago |
Deins
|
2bb98279c5
readme : add zig bindings (#4581)
|
2 years ago |
bobqianic
|
0137ef88ea
ggml : extend `enum ggml_log_level` with `GGML_LOG_LEVEL_DEBUG` (#4579)
|
2 years ago |
crasm
|
c7e9701f86
llama : add ability to cancel model loading (#4462)
|
2 years ago |
Georgi Gerganov
|
afefa319f1
ggml : change ggml_scale to take a float instead of tensor (#4573)
|
2 years ago |
Georgi Gerganov
|
769a7bc85e
gguf-py : fix broken link
|
2 years ago |
Georgi Gerganov
|
32259b2dad
gguf : simplify example dependencies
|
2 years ago |
Samuel Maynard
|
4a5f9d629e
ci : add `jlumbroso/free-disk-space` to docker workflow (#4150)
|
2 years ago |
slaren
|
d232aca5a7
llama : initial ggml-backend integration (#4520)
|
2 years ago |
Marcus Dunn
|
31f27758fa
llama : allow getting n_batch from llama_context in c api (#4540)
|
2 years ago |
Finn Voorhees
|
56fa50819f
metal : fix `ggml_metal_log` vargs (#4373)
|
2 years ago |
Erik Garrison
|
0f630fbc92
cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)
|
2 years ago |
arlo-phoenix
|
562cf222b5
ggml-cuda: Fix HIP build by adding define for __trap (#4569)
|
2 years ago |
Jared Van Bortel
|
8fe03ffdda
common : remove incorrect --model-draft default (#4568)
|
2 years ago |
Johannes Gäßler
|
9154494808
CUDA: mul_mat_id always on GPU for batches >= 32 (#4553)
|
2 years ago |
Georgi Gerganov
|
c083718c89
readme : update coding guidelines
|
2 years ago |
howlger
|
880e352277
py : open merges file as 'utf-8' (#4566)
|
2 years ago |