瀏覽代碼

server : fix draft context not being released (#11354)

Diego Devesa 1 年之前
父節點
當前提交
12c2bdf2de
共有 1 個文件被更改,包括 3 次插入0 次删除
  1. 3 0
      examples/server/server.cpp

+ 3 - 0
examples/server/server.cpp

@@ -1772,6 +1772,9 @@ struct server_context {
             // force F16 KV cache for the draft model for extra performance
             cparams_dft.type_k = GGML_TYPE_F16;
             cparams_dft.type_v = GGML_TYPE_F16;
+
+            // the context is not needed - we will create one for each slot
+            llama_init_dft.context.reset();
         }
 
         chat_templates = common_chat_templates_from_model(model, params_base.chat_template);