Aman Gupta
|
48e2fa9fb7
CUDA: add fp kernel for larger batch size MoE (#16512)
|
3 месяцев назад |
Anav Prasad
|
5b6913c47b
cuda : remove legacy copy-op pointer indirection code (#16485)
|
3 месяцев назад |
Georgi Gerganov
|
bc07349a7f
server : dynamic token limit for prompt cache (#16560)
|
3 месяцев назад |
Georgi Gerganov
|
e60f241eac
metal : FA support F32 K and V and head size = 32 (#16531)
|
3 месяцев назад |
Georgi Gerganov
|
e38b7c6e9e
graph : support cacheless embeddings with FA and iSWA (#16528)
|
3 месяцев назад |
lhez
|
5016b72862
opencl: fix build targeting CL 2 (#16554)
|
3 месяцев назад |
Johannes Gäßler
|
7049736b2d
CUDA: fix numerical issues in tile FA kernel (#16540)
|
3 месяцев назад |
Jie Fu (傅杰)
|
01d2bdc2bc
ggml : fix build broken with -march=armv9-a on MacOS (#16520)
|
3 месяцев назад |
Chenguang Li
|
56fc38b965
CANN: fix CPU memory leak in CANN backend (#16549)
|
3 месяцев назад |
Pascal
|
1fb9504eb7
fix: add remark plugin to render raw HTML as literal text (#16505)
|
3 месяцев назад |
Sam/Samuel
|
3f750f8d76
metal: add support for opt_step_sgd (#16539)
|
3 месяцев назад |
Georgi Gerganov
|
c515fc5771
ggml : fix scalar path for computing norm (#16558)
|
3 месяцев назад |
hipudding
|
f9bc66c3eb
CANN: Update several operators to support FP16 data format (#16251)
|
3 месяцев назад |
Sam/Samuel
|
a31cf36ad9
metal : add opt_step_adamw and op_sum (#16529)
|
3 месяцев назад |
Pascal
|
81d54bbfd5
webui: remove client-side context pre-check and rely on backend for limits (#16506)
|
3 месяцев назад |
Neo Zhang Jianyu
|
c7be9febcb
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521)
|
3 месяцев назад |
Mathieu Baudier
|
8415f61e23
ci : add Vulkan on Ubuntu with default packages build (#16532)
|
3 месяцев назад |
Aldehir Rojas
|
2c301e91ab
common : handle unicode during partial json parsing (#16526)
|
3 месяцев назад |
Georgi Gerganov
|
4b2dae383d
common : update presets (#16504)
|
3 месяцев назад |
sirus20x6
|
41aac5c69b
ggml : Fix FP16 ELU positive branch (#16519)
|
3 месяцев назад |
Daniel Bevenius
|
a2fba89a42
hparams : add check for layer index in is_recurrent (#16511)
|
3 месяцев назад |
sirus20x6
|
20cc625edc
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
|
3 месяцев назад |
Johannes Gäßler
|
11f0af5504
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
3 месяцев назад |
Georgi Gerganov
|
a3cb04744f
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
|
3 месяцев назад |
Pascal
|
4a8fbe0a5e
feat: render user content as markdown option (#16358)
|
3 месяцев назад |
Yann Follet
|
31d0ff1869
server / ranking : add sorting and management of top_n (#16403)
|
3 месяцев назад |
Diego Devesa
|
97870e6497
cuda : avoid initializing unused devices (#16510)
|
3 месяцев назад |
amirai21
|
477a66b035
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
|
3 месяцев назад |
Georgi Gerganov
|
e60f01d941
server : fix division by zero when reporting stats (#16501)
|
3 месяцев назад |
Georgi Gerganov
|
81086cd6a3
vocab : mark EOT token for Granite models (#16499)
|
3 месяцев назад |