I’m still new to this and I thought that 128gb CPU ram would be enough to run a 70b model? I also have an RTX 4090. However, everytime I try to run lzlv_Q4_K_M.gguf in Text Generation UI, I get “connection errored out”. Could there be a setting that I should tinker with?
I haven’t tried to run a model that big on CPU RAM only, but running a Q4_0 gguf of Causal 14B was already mind numbingly slow on my rig.
General rule of thumb, always utilize as much of your VRAM (GPU RAM) as possible since CPU RAM is exponentially slower. I’m guessing your connection timed out because it just took to long to load/run.
With a 4090, you can actually run lzlv 70B fully on your 24GB VRAM. Let’s not let your amazing GPU go to waste! Try these steps and let me know if it works out for you: