Georgi Gerganov
|
9082b5dfbf
ggml : change params pointer (style change) (#2539)
|
2 years ago |
Georgi Gerganov
|
99d29c0094
ggml : sync (custom ops) (#2537)
|
2 years ago |
Johannes Gäßler
|
3d9a551816
Fixed mmap prefetch for GPU offloading (#2529)
|
2 years ago |
Georgi Gerganov
|
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
|
2 years ago |
GiviMAD
|
34a14b28ff
[Makefile] Move ARM CFLAGS before compilation (#2536)
|
2 years ago |
Henri Vasserman
|
7297128db8
[Zig] Rewrite build for Zig 0.11 (#2514)
|
2 years ago |
DannyDaemonic
|
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence (#2521)
|
2 years ago |
Keiichi Tabata
|
2e8265ae17
convert.py : add missing abstract methods for quantized data (#2491)
|
2 years ago |
Johannes Gäßler
|
f514d1b306
CUDA: faster k-quant mul_mat_q kernels (#2525)
|
2 years ago |
Jonas Wunderlich
|
332311234a
fix firefox autoscroll (#2519)
|
2 years ago |
Cebtenzzre
|
182af739c4
server: regenerate completion.js.hpp (#2515)
|
2 years ago |
Cebtenzzre
|
4329d1acb0
CUDA: use min compute capability of GPUs actually used (#2506)
|
2 years ago |
Cebtenzzre
|
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
|
2 years ago |
DannyDaemonic
|
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp (#1558)
|
2 years ago |
Stephen Nichols
|
5f631c2679
Fixing race condition in server and partial stream handling in frontend. (#2391)
|
2 years ago |
l3utterfly
|
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
|
2 years ago |
Borislav Stanimirov
|
ff966e7ca6
build : fix several cast and printf warnings (#2499)
|
2 years ago |
Evan Jones
|
8183159cf3
examples : generate JSON according to schema (#1887)
|
2 years ago |
Johannes Gäßler
|
468ea24fb4
CUDA: faster non k-quant mul_mat_q kernels (#2483)
|
2 years ago |
Johannes Gäßler
|
4f6b60c776
CUDA: Fix models with output size != 32000 (#2480)
|
2 years ago |
ldwang
|
220d931864
readme : add Aquila-7B model series to supported models (#2487)
|
2 years ago |
Eve
|
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) (#2451)
|
2 years ago |
Yiming Cui
|
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
|
2 years ago |
Bono Lv
|
c574bddb36
fix a typo in examples/server/README.md (#2478)
|
2 years ago |
ebraminio
|
86aeb27734
server : Support dark mode (#2414)
|
2 years ago |
Matteo Boschini
|
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
|
2 years ago |
Johannes Gäßler
|
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option (#2473)
|
2 years ago |
Johannes Gäßler
|
b772bba42e
CUDA: fixed cmake F16 option (#2471)
|
2 years ago |
Johannes Gäßler
|
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453)
|
2 years ago |
Johannes Gäßler
|
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE (#2468)
|
2 years ago |