Reese Levine
|
d304f459d8
GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018)
|
4 months ago |
Georgi Gerganov
|
0320ac5264
metal : refactor + optimize v2 (#15995)
|
4 months ago |
Aleksander Grygier
|
a7a98e0fff
SvelteKit-based WebUI (#14839)
|
4 months ago |
Xuan-Son Nguyen
|
8f8f2274ee
convert : add Llama4ForCausalLM (#16042)
|
4 months ago |
Johannes Gäßler
|
c959b676be
CUDA: fix FA occupancy, optimize tile kernel (#15982)
|
4 months ago |
David Ribeiro Alves
|
cd08fc3ecc
common : Fix corrupted memory error on json grammar initialization (#16038)
|
4 months ago |
Eve
|
cb5bb6cc05
vulkan: automatically remove unsupported devices (#15976)
|
4 months ago |
Daniel Bevenius
|
a91d035b90
ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040)
|
4 months ago |
Jie Fu (傅杰)
|
745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
|
4 months ago |
Jie Fu (傅杰)
|
1cbd80f8cf
examples : support encoder-decoder models in the simple example (#16002)
|
4 months ago |
Shane A
|
85286f3548
model : add OLMo3 support (#16015)
|
4 months ago |
Chenguang Li
|
d5fabe3682
CANN: Optimize ggml_cann_set_device (#15935)
|
4 months ago |
jacekpoplawski
|
8ff206097c
llama-bench: add --n-cpu-moe support (#15952)
|
4 months ago |
Daniel Bevenius
|
77475530b8
ci : use macos-latest for arm64 webgpu build (#16029)
|
4 months ago |
Daniel Bevenius
|
3913f8730e
ggml : fix padding in timestep embedding kernels (#15932)
|
4 months ago |
Daniel Bevenius
|
76888d202e
ci : upload xcframework artifact from ios-xcode-build job (#16010)
|
4 months ago |
Bowen Han
|
f1fbffb5c0
fix: apply clang-format to CUDA macros (#16017)
|
4 months ago |
Daniel Bevenius
|
51abc96bdc
ci : update macos-latest* jobs to use macos-latest (#15938)
|
4 months ago |
Yuri Khrustalev
|
07808ebb07
cmake : Do not install tools on iOS targets (#15903)
|
4 months ago |
Aman Gupta
|
6d758839ff
Add LLaDA-7b-MoE diffusion model (#16003)
|
4 months ago |
Jake Karnes
|
3d4053f77f
CUDA: fix im2col_3d to respect non-contiguous inputs (views) (#15956)
|
4 months ago |
Diego Devesa
|
dc381aa9a6
docker : enable rocWMMA in ROCm images, add gfx1151 (#15997)
|
4 months ago |
Diego Devesa
|
10d197409b
releases : switch to rocWMMA develop branch, add gfx1151 (#15992)
|
4 months ago |
yael-works
|
b907255f4b
SYCL: Add COUNT_EQUAL operator support (#15991)
|
4 months ago |
Nikolay Popov
|
28c39da7c6
llama-run: Fix model download on Windows (#15988)
|
4 months ago |
Aman Gupta
|
106220562a
CUDA: some micro-optimizations in mmf.cuh for mul_mat_id (#15926)
|
4 months ago |
ddh0
|
a68f31edd7
fix KLD percentile output (#15999)
|
4 months ago |
Sigbjørn Skjæret
|
b8e09f08b9
model : add grok-2 support (#15539)
|
4 months ago |
Sigbjørn Skjæret
|
6c019cb04e
server : only attempt to enable thinking if using jinja (#15967)
|
4 months ago |
Georgi Gerganov
|
9dcd200d57
metal : remove memory pools (#15966)
|
4 months ago |