Michael Coppola
|
31e7903221
server : add `dynatemp_range` and `dynatemp_exponent` (#5352)
|
1 year ago |
Niall Coates
|
4ffc7a17d4
server : various fixes for the prompt field in /completion (#5300)
|
1 year ago |
Georgi Gerganov
|
906cff55c2
py : handle byte tokens in `get_token_type` (#5341)
|
1 year ago |
Johannes Gäßler
|
098f6d737b
make: Use ccache for faster compilation (#5318)
|
1 year ago |
Johannes Gäßler
|
78b00dda6c
README: updated introduction (#5343)
|
1 year ago |
Kawrakow
|
c6b395535a
ggml : make use of ggml-quants.h possible in C++ code (#5338)
|
1 year ago |
Dr. Tom Murphy VII Ph.D
|
abb61944a5
ggml : avoid duplicating function calls using MIN/MAX macros (#5325)
|
1 year ago |
Kawrakow
|
89503dcb5f
iq3_xxs: quards for the no-imatrix situation (#5334)
|
1 year ago |
Guoteng
|
7e1ae372f3
py : fix internlm2-hf convert to gguf (#5305)
|
1 year ago |
Kawrakow
|
6fdfa2ecc6
iq2_xxs: tune quantization (#5320)
|
1 year ago |
Alexey Parfenov
|
a2d60c9158
server : allow to get default generation settings for completion (#5307)
|
1 year ago |
l3utterfly
|
e6f8177532
common : add dynamic temperature parameters to main example cli (#5295)
|
1 year ago |
Georgi Gerganov
|
30679d438d
scripts : fix typos, cleanup (#5303)
|
1 year ago |
Нияз Гарифзянов
|
4be04c8965
scripts : add non-interactive server-llm.sh (#5303)
|
1 year ago |
chiranko
|
5d55b0cd82
readme : add CodeShell models to the supported models list (#5330)
|
1 year ago |
AidanBeltonS
|
4833ac209d
[SYCL] Fix cpy with dims of 3 (#5289)
|
1 year ago |
github-actions[bot]
|
9392ebd49e
flake.lock: Update
|
1 year ago |
Kawrakow
|
5ed26e1fc9
Adding some imatrix tools (#5302)
|
1 year ago |
Welby Seely
|
277fad30c6
cmake : use set() for LLAMA_WIN_VER (#5298)
|
1 year ago |
Johannes Gäßler
|
3c0d25c475
make: add nvcc info print (#5310)
|
1 year ago |
Johannes Gäßler
|
3cc5ed353c
make: fix nvcc optimization flags for host code (#5309)
|
1 year ago |
Martin Schwaighofer
|
60ecf099ed
add Vulkan support to Nix flake
|
2 years ago |
0cc4m
|
e920ed393d
Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301)
|
1 year ago |
Michael Klimenko
|
52bb63c708
refactor : switch to emplace_back to avoid extra object (#5291)
|
1 year ago |
Jared Van Bortel
|
1ec3332ade
YaRN : store rope scaling type as int32_t in memory (#5285)
|
1 year ago |
BADR
|
6a66c5071a
readme : add tenere in the ui tools list (#5284)
|
1 year ago |
AidanBeltonS
|
a305dba8ff
Fix im2col with 32fp (#5286)
|
1 year ago |
kalomaze
|
191221178f
perplexity : fix KL divergence calculations on Windows (#5273)
|
1 year ago |
Georgi Gerganov
|
e437b37fd0
scripts : parse wtype in server-llm.sh (#5167)
|
1 year ago |
Mirror Azure
|
2d40085c26
py : add check for '.attn.masked_bias' layers to GPT2model (#5281)
|
1 year ago |