Should I be able to run lzlv_Q4_K_M.gguf with 128gb cpu ram? It keeps erroring out.

Brad12d3@alien.top · 2 years ago

Should I be able to run lzlv_Q4_K_M.gguf with 128gb cpu ram? It keeps erroring out.

brobruh211@alien.top · 2 years ago

I haven’t tried to run a model that big on CPU RAM only, but running a Q4_0 gguf of Causal 14B was already mind numbingly slow on my rig.

General rule of thumb, always utilize as much of your VRAM (GPU RAM) as possible since CPU RAM is exponentially slower. I’m guessing your connection timed out because it just took to long to load/run.

With a 4090, you can actually run lzlv 70B fully on your 24GB VRAM. Let’s not let your amazing GPU go to waste! Try these steps and let me know if it works out for you:

Paste this on the Download box of text-gen-ui: waldie/lzlv-limarpv3-l2-70b-2.4bpw-h6-exl2
Hit download. This should download an ExLlamav2 quant of lzlv that fits in your VRAM.
Select the model from the drop down and just hit Load using the default settings. (Optional) You can tick “Use 8-bit cache to save VRAM”
Enjoy! The perplexity of the file I suggested as high as lzlv_Q4_K_M, but at least you should be able to run it with no problems and get decent outputs as well.