Georgi Gerganov
|
ff8238f71d
docs : add llama-star arch idea
|
2 years ago |
Galunid
|
8e672efe63
stablelm : simplify + speedup generation (#4153)
|
2 years ago |
Galunid
|
0b871f1a04
finetune - update readme to mention llama support only (#4148)
|
2 years ago |
Aaryaman Vasishta
|
dfc7cd48b1
readme : update ROCm Windows instructions (#4122)
|
2 years ago |
Seb C
|
881800d1f0
main : Add ChatML functionality to main example (#4046)
|
2 years ago |
Galunid
|
f23c0359a3
ci : add flake8 to github actions (python linting) (#4129)
|
2 years ago |
Branden Butler
|
40a34fe8d0
speculative : fix prompt tokenization in speculative example (#4025)
|
2 years ago |
Georgi Gerganov
|
dae06c06e5
Revert "finetune : add --n-gpu-layers flag info to --help (#4128)"
|
2 years ago |
Clark Saben
|
05e8301e45
finetune : add --n-gpu-layers flag info to --help (#4128)
|
2 years ago |
SoftwareRenderer
|
936c79b227
server : relay error messages (#4131)
|
2 years ago |
kchro3
|
262005ad9d
common : comma should be semicolon (#4137)
|
2 years ago |
Georgi Gerganov
|
35985acffa
gitignore : tokenize
|
2 years ago |
slaren
|
e937066420
gguf-py : export chat templates (#4125)
|
2 years ago |
Kerfuffle
|
28a2e6e7d4
tokenize example: Respect normal add BOS token behavior (#4126)
|
2 years ago |
Galunid
|
0b5c3b0457
scripts : Remove missed baichuan convert script (#4127)
|
2 years ago |
Kerfuffle
|
2923f17f6f
Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124)
|
2 years ago |
slaren
|
bbecf3f415
llama : increase max nodes (#4115)
|
2 years ago |
Roger Meier
|
8e9361089d
build : support ppc64le build for make and CMake (#3963)
|
2 years ago |
Georgi Gerganov
|
5ad387e994
tokenize : fix trailing whitespace
|
2 years ago |
zakkor
|
2fa02b4b3d
examples : add tokenize (#4039)
|
2 years ago |
Don Mahurin
|
2ab0707acb
convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089)
|
2 years ago |
John
|
11173c92d6
py : Falcon HF compatibility (#4104)
|
2 years ago |
Jannis Schönleber
|
9e87ef60e1
common : improve yaml log escaping (#4080)
|
2 years ago |
Huawei Lin
|
c7cce1246e
llava : fix compilation warning that fread return value is not used (#4069)
|
2 years ago |
Jiří Podivín
|
f7d5e97542
py : remove superfluous import statements (#4076)
|
2 years ago |
Jiří Podivín
|
ba4cf5c0bf
train : move number of gpu layers argument parsing to common/train.cpp (#4074)
|
2 years ago |
slaren
|
e85bb1a8e7
llama : add functions to get the model's metadata (#4013)
|
2 years ago |
gwjr
|
3e916a07ac
finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079)
|
2 years ago |
Andrew Godfrey
|
947f64f163
finetune : zero the loraB initial vectors (#4082)
|
2 years ago |
Andrew Godfrey
|
b83e149ec6
cuda : get_row_rounding F32 (#4095)
|
2 years ago |