Georgi Gerganov
|
644fd71b44
sampling : refactor + optimize penalties sampler (#10803)
|
1 жил өмнө |
Bartowski
|
4ddd199f6f
llava : Allow locally downloaded models for QwenVL (#10833)
|
1 жил өмнө |
Valentin Mamedov
|
a0974156f3
llama : add Deepseek MoE v1 & GigaChat models (#10827)
|
1 жил өмнө |
Georgi Gerganov
|
87cf323cef
scripts : change build path to "build-bench" for compare-commits.sh (#10836)
|
1 жил өмнө |
Vinesh Janarthanan
|
5478bbcd17
server: (UI) add syntax highlighting and latex math rendering (#10808)
|
1 жил өмнө |
Georgi Gerganov
|
b5ae1ddff9
gguf-py : bump to v0.13.0
|
1 жил өмнө |
Michelle Tan
|
89d604f2c8
server: Fix `has_next_line` in JSON response (#10818)
|
1 жил өмнө |
Evgeny Kurnevsky
|
e52aba537a
nix: allow to override rocm gpu targets (#10794)
|
1 жил өмнө |
HimariO
|
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE (#10361)
|
1 жил өмнө |
cduk
|
56eea0781c
Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771)
|
1 жил өмнө |
lhez
|
a76c56fa1a
Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693)
|
1 жил өмнө |
Eric Curtin
|
c27ac678dd
Opt class for positional argument handling (#10508)
|
1 жил өмнө |
Corentin REGAL
|
11e07fd63b
fix: graceful shutdown for Docker images (#10815)
|
1 жил өмнө |
Jett Janiak
|
4601a8bb67
gguf-py : numpy 2 newbyteorder fix (#9772)
|
1 жил өмнө |
谢乃闻
|
9f35e44592
Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812)
|
1 жил өмнө |
Eve
|
64ae065511
vulkan: small mul_mat_vec optimizations (#10665)
|
1 жил өмнө |
Akarshan Biswas
|
83ed24a97b
SYCL: Reduce most of the compiler warnings (#10748)
|
1 жил өмнө |
Karol Kontny
|
d583cd03f6
ggml : Fix compilation issues on ARM platform when building without fp16 (#10811)
|
1 жил өмнө |
Xuan Son Nguyen
|
adffa6ffd5
common : improve -ctv -ctk CLI arguments (#10806)
|
1 жил өмнө |
Xuan Son Nguyen
|
274ec65af6
contrib : add ngxson as codeowner (#10804)
|
1 жил өмнө |
a3sh
|
8faa1d4dd4
CUDA: faster non-contiguous concat (#10760)
|
1 жил өмнө |
Diego Devesa
|
cb13ef85a4
remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797)
|
1 жил өмнө |
0cc4m
|
4064c0e3b6
Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798)
|
1 жил өмнө |
0cc4m
|
dc5301d565
Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721)
|
1 жил өмнө |
Xuan Son Nguyen
|
9fdb124304
common : add missing env var for speculative (#10801)
|
1 жил өмнө |
CentricStorm
|
5555c0c1f6
docs: update server streaming mode documentation (#9519)
|
1 жил өмнө |
Georgi Gerganov
|
973f328b1e
Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0
|
1 жил өмнө |
Georgi Gerganov
|
fb18934a97
gguf-py : bump version to 0.11.0
|
1 жил өмнө |
Xuan Son Nguyen
|
235f6e14bf
server : (UI) add tok/s, get rid of completion.js (#10786)
|
1 жил өмнө |
qingy1337
|
1a31d0dc00
Update README.md (#10772)
|
1 жил өмнө |