Commit History

Author SHA1 Message Date
  slaren 5bf3953d7e cuda : improve cuda pool efficiency using virtual memory (#4606) 2 years ago
  slaren 708e179e85 fallback to CPU buffer if host buffer alloc fails (#4610) 2 years ago
  Samuel Maynard 925e5584a0 ci(docker): fix tags in "Build and push docker image (tagged)" (#4603) 2 years ago
  Alexey Parfenov 6123979952 server : allow to specify custom prompt for penalty calculation (#3727) 2 years ago
  kalomaze b9ec82d262 grammar : check the full vocab only if necessary (opt) (#4306) 2 years ago
  Johannes Gäßler e0a4002273 CUDA: fixed row rounding for 0 tensor splits (#4594) 2 years ago
  LeonEricsson 7082d24cec lookup : add prompt lookup decoding example (#4484) 2 years ago
  Georgi Gerganov ba66175132 sync : ggml (fix im2col) (#4591) 2 years ago
  FantasyGmm a55876955b cuda : fix jetson compile error (#4560) 2 years ago
  Henrik Forstén 6724ef1657 Fix CudaMemcpy direction (#4599) 2 years ago
  slaren 48b7ff193e llama : fix platforms without mmap (#4578) 2 years ago
  Herman Semenov 48b24b170e ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) 2 years ago
  Michael Kesper 28cb35a0ec make : add LLAMA_HIP_UMA option (#4587) 2 years ago
  rhuddleston f31b984898 ci : tag docker image with build number (#4584) 2 years ago
  Deins 2bb98279c5 readme : add zig bindings (#4581) 2 years ago
  bobqianic 0137ef88ea ggml : extend `enum ggml_log_level` with `GGML_LOG_LEVEL_DEBUG` (#4579) 2 years ago
  crasm c7e9701f86 llama : add ability to cancel model loading (#4462) 2 years ago
  Georgi Gerganov afefa319f1 ggml : change ggml_scale to take a float instead of tensor (#4573) 2 years ago
  Georgi Gerganov 769a7bc85e gguf-py : fix broken link 2 years ago
  Georgi Gerganov 32259b2dad gguf : simplify example dependencies 2 years ago
  Samuel Maynard 4a5f9d629e ci : add `jlumbroso/free-disk-space` to docker workflow (#4150) 2 years ago
  slaren d232aca5a7 llama : initial ggml-backend integration (#4520) 2 years ago
  Marcus Dunn 31f27758fa llama : allow getting n_batch from llama_context in c api (#4540) 2 years ago
  Finn Voorhees 56fa50819f metal : fix `ggml_metal_log` vargs (#4373) 2 years ago
  Erik Garrison 0f630fbc92 cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) 2 years ago
  arlo-phoenix 562cf222b5 ggml-cuda: Fix HIP build by adding define for __trap (#4569) 2 years ago
  Jared Van Bortel 8fe03ffdda common : remove incorrect --model-draft default (#4568) 2 years ago
  Johannes Gäßler 9154494808 CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) 2 years ago
  Georgi Gerganov c083718c89 readme : update coding guidelines 2 years ago
  howlger 880e352277 py : open merges file as 'utf-8' (#4566) 2 years ago