Historique des commits

Auteur SHA1 Message Date
  Johannes Gäßler e789095502 llama: print memory breakdown on exit (#15860) il y a 4 mois
  Jeff Bolz c0b45097c3 rename optimize_graph to graph_optimize (#16082) il y a 4 mois
  Jeff Bolz e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850) il y a 4 mois
  Johannes Gäßler 5d804a4938 ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722) il y a 4 mois
  Diego Devesa 9777032dcc llama : separate compute buffer reserve from fattn check (#15696) il y a 4 mois
  Johannes Gäßler e81b8e4b7f llama: use FA + max. GPU layers by default (#15434) il y a 4 mois
  Diego Devesa 54a241f505 sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488) il y a 5 mois
  Diego Devesa 5682a3745f sched : copy only the used experts when offloading prompt processing (#15346) il y a 5 mois
  Diego Devesa 0d8831543c ggml : fix fallback to CPU for ununsupported ops (#15118) il y a 5 mois
  Diego Devesa c12bbde372 sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855) il y a 6 mois
  Georgi Gerganov bf9087f59a metal : fuse add, mul + add tests (#14596) il y a 6 mois
  Jeff Bolz bd9c981d72 vulkan: Add fusion support for RMS_NORM+MUL (#14366) il y a 7 mois
  Diego Devesa b47ab7b8e9 sched : avoid changing cur_copy when a graph is already allocated (#13922) il y a 7 mois
  Diego Devesa 952f3953c1 ggml : allow CUDA graphs when using pipeline parallelism (#13814) il y a 8 mois
  Johannes Gäßler 10d2af0eaa llama/ggml: add LLM training support (#10544) il y a 8 mois
  David Huang 7f323a589f Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386) il y a 8 mois
  Johannes Gäßler 9070365020 CUDA: fix logic for clearing padding with -ngl 0 (#13320) il y a 8 mois
  mgroeber9110 5bbe6a9fe9 ggml : portability fixes for VS 2017 (#12150) il y a 10 mois
  William Tambellini 70680c48e5 ggml : upgrade init_tensor API to return a ggml_status (#11854) il y a 11 mois
  Diego Devesa 017cc5f446 ggml-backend : only offload from host buffers (fix) (#11124) il y a 1 an
  Diego Devesa a3d50bc022 ggml-backend : only offload from host buffers (#11120) il y a 1 an
  Daniel Bevenius db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) il y a 1 an
  Diego Devesa 7cc2d2c889 ggml : move AMX to the CPU backend (#10570) il y a 1 an
  slaren 59b9172822 ggml/sched : do not skip views in pre-assignments il y a 1 an
  Johannes Gäßler 02e4eaf22f ggml-opt: fix data corruption (ggml/1022) il y a 1 an
  Diego Devesa be5caccef9 llama : only use default buffer types for the KV cache (#10358) il y a 1 an
  Diego Devesa eda7e1d4f5 ggml : fix possible buffer use after free in sched reserve (#9930) il y a 1 an
  Johannes Gäßler 8a43e940ab ggml: new optimization interface (ggml/988) il y a 1 an
  Diego Devesa ae8de6d50a ggml : build backends as libraries (#10256) il y a 1 an
  Diego Devesa 9f40989351 ggml : move CPU backend to a separate file (#10144) il y a 1 an