Christian Demsar
|
e59fcb2bc1
Add --n-predict -2 for stopping generation on full context (#2565)
|
2 years ago |
Martin Krasser
|
1638757767
Fix grammar-based sampling issue in server (#2566)
|
2 years ago |
Sam Spilsbury
|
916a9acdd0
ggml-alloc: Don't try to re-use buffers of external tensors (#2562)
|
2 years ago |
grahameth
|
ea04a4ca19
add log_callback to llama_context_params for custom logging. (#2234)
|
2 years ago |
Johannes Gäßler
|
25d43e0eb5
CUDA: tuned mul_mat_q kernels (#2546)
|
2 years ago |
Martin Krasser
|
f5bfea0580
Allow passing grammar to completion endpoint (#2532)
|
2 years ago |
Johannes Gäßler
|
acfc5478ff
CUDA: tighter VRAM scratch size for 65b/70b (#2551)
|
2 years ago |
chaihahaha
|
7ed8d1fe7f
llm.vim : multiline autocompletion, get rid of "^@" (#2543)
|
2 years ago |
Georgi Gerganov
|
e7f94d6fdc
vim : bring back simple llm.vim example
|
2 years ago |
AustinMroz
|
2d7baaf50f
vim : streaming and more (#2495)
|
2 years ago |
klosax
|
f3c3b4b167
Add --rope-scale parameter (#2544)
|
2 years ago |
Georgi Gerganov
|
93356bdb7a
ggml : mul mat tweaks (#2372)
|
2 years ago |
Georgi Gerganov
|
60baff7c85
ggml : pad result of ggml_nbytes()
|
2 years ago |
Georgi Gerganov
|
9082b5dfbf
ggml : change params pointer (style change) (#2539)
|
2 years ago |
Georgi Gerganov
|
99d29c0094
ggml : sync (custom ops) (#2537)
|
2 years ago |
Johannes Gäßler
|
3d9a551816
Fixed mmap prefetch for GPU offloading (#2529)
|
2 years ago |
Georgi Gerganov
|
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
|
2 years ago |
GiviMAD
|
34a14b28ff
[Makefile] Move ARM CFLAGS before compilation (#2536)
|
2 years ago |
Henri Vasserman
|
7297128db8
[Zig] Rewrite build for Zig 0.11 (#2514)
|
2 years ago |
DannyDaemonic
|
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence (#2521)
|
2 years ago |
Keiichi Tabata
|
2e8265ae17
convert.py : add missing abstract methods for quantized data (#2491)
|
2 years ago |
Johannes Gäßler
|
f514d1b306
CUDA: faster k-quant mul_mat_q kernels (#2525)
|
2 years ago |
Jonas Wunderlich
|
332311234a
fix firefox autoscroll (#2519)
|
2 years ago |
Cebtenzzre
|
182af739c4
server: regenerate completion.js.hpp (#2515)
|
2 years ago |
Cebtenzzre
|
4329d1acb0
CUDA: use min compute capability of GPUs actually used (#2506)
|
2 years ago |
Cebtenzzre
|
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
|
2 years ago |
DannyDaemonic
|
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp (#1558)
|
2 years ago |
Stephen Nichols
|
5f631c2679
Fixing race condition in server and partial stream handling in frontend. (#2391)
|
2 years ago |
l3utterfly
|
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
|
2 years ago |
Borislav Stanimirov
|
ff966e7ca6
build : fix several cast and printf warnings (#2499)
|
2 years ago |