Georgi Gerganov
|
f6f9896ac3
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
|
2 سال پیش |
GiviMAD
|
34a14b28ff
[Makefile] Move ARM CFLAGS before compilation (#2536)
|
2 سال پیش |
Henri Vasserman
|
7297128db8
[Zig] Rewrite build for Zig 0.11 (#2514)
|
2 سال پیش |
DannyDaemonic
|
86c3219895
console : fix issue related to Windows 11 PowerShell console mode persistence (#2521)
|
2 سال پیش |
Keiichi Tabata
|
2e8265ae17
convert.py : add missing abstract methods for quantized data (#2491)
|
2 سال پیش |
Johannes Gäßler
|
f514d1b306
CUDA: faster k-quant mul_mat_q kernels (#2525)
|
2 سال پیش |
Jonas Wunderlich
|
332311234a
fix firefox autoscroll (#2519)
|
2 سال پیش |
Cebtenzzre
|
182af739c4
server: regenerate completion.js.hpp (#2515)
|
2 سال پیش |
Cebtenzzre
|
4329d1acb0
CUDA: use min compute capability of GPUs actually used (#2506)
|
2 سال پیش |
Cebtenzzre
|
02f9d96a86
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
|
2 سال پیش |
DannyDaemonic
|
3498588e0f
Add --simple-io option for subprocesses and break out console.h and cpp (#1558)
|
2 سال پیش |
Stephen Nichols
|
5f631c2679
Fixing race condition in server and partial stream handling in frontend. (#2391)
|
2 سال پیش |
l3utterfly
|
415e99fec2
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
|
2 سال پیش |
Borislav Stanimirov
|
ff966e7ca6
build : fix several cast and printf warnings (#2499)
|
2 سال پیش |
Evan Jones
|
8183159cf3
examples : generate JSON according to schema (#1887)
|
2 سال پیش |
Johannes Gäßler
|
468ea24fb4
CUDA: faster non k-quant mul_mat_q kernels (#2483)
|
2 سال پیش |
Johannes Gäßler
|
4f6b60c776
CUDA: Fix models with output size != 32000 (#2480)
|
2 سال پیش |
ldwang
|
220d931864
readme : add Aquila-7B model series to supported models (#2487)
|
2 سال پیش |
Eve
|
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) (#2451)
|
2 سال پیش |
Yiming Cui
|
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
|
2 سال پیش |
Bono Lv
|
c574bddb36
fix a typo in examples/server/README.md (#2478)
|
2 سال پیش |
ebraminio
|
86aeb27734
server : Support dark mode (#2414)
|
2 سال پیش |
Matteo Boschini
|
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
|
2 سال پیش |
Johannes Gäßler
|
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option (#2473)
|
2 سال پیش |
Johannes Gäßler
|
b772bba42e
CUDA: fixed cmake F16 option (#2471)
|
2 سال پیش |
Johannes Gäßler
|
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453)
|
2 سال پیش |
Johannes Gäßler
|
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE (#2468)
|
2 سال پیش |
Johannes Gäßler
|
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q (#2458)
|
2 سال پیش |
slaren
|
9d2382b3e4
Fix Metal backend broken from the allocator changes (#2455)
|
2 سال پیش |
slaren
|
a113689571
ggml : add graph tensor allocator (#2411)
|
2 سال پیش |