Explorar o código

llama : fix not enough space in buffer with Qwen (#5086)

slaren %!s(int64=2) %!d(string=hai) anos
pai
achega
011e8ec577
Modificáronse 1 ficheiros con 1 adicións e 1 borrados
  1. 1 1
      llama.cpp

+ 1 - 1
llama.cpp

@@ -4440,9 +4440,9 @@ static struct ggml_tensor * llm_build_kv(
 
     // these nodes are added to the graph together so that they are not reordered
     // by doing so, the number of splits in the graph is reduced
+    ggml_build_forward_expand(graph, q_cur);
     ggml_build_forward_expand(graph, k_cur);
     ggml_build_forward_expand(graph, v_cur);
-    ggml_build_forward_expand(graph, q_cur);
 
     llm_build_kv_store(ctx, hparams, kv, graph, k_cur, v_cur, n_ctx, n_tokens, kv_head, cb, il);