Best way to upgrade my pc to improve t/s

DarthByron@alien.top · 1 year ago

Best way to upgrade my pc to improve t/s

Imaginary_Bench_7294@alien.top · 1 year ago

If your goal is to run the model locally, your best option is to increase your Vram as much as you can. Main things to consider is the vram bandwidth of the card and the capacity. For a 70b 4 bit model you’re looking at needing somewhere around 35-40 GB of vram.

The model alone will take roughly 35GB, the loader up to another 3GB, and then the full context length of 4096 could spill it over 40GB.

I run LZLV 70b, 4.65bit on 2x3090’s and get 4.5+ T/s using ExllamaV2 and the EXL2 format. That is at full context length and chat mode in Oobabooga.

In the default/notebook modes I can get 7+ T/s at full context length.

Now, your power supply may be on the low side to add another card without putting power limiters on things. I’ll use stock power settings as reference.

4080 is rated to hit 320 w

13700 is rated at 65W

Let’s ass in another 100 watts for SSD’s, HDD’s, mobo and cooling solutions.

So you’re looking at 485w of draw. You should always shoot for a minimum 10-15% overhead, which cuts your max draw down to 722-765 watts.

That leaves you 237-280w of possible room to play with.

So it’s possible to add another video card to the computer, but you’ll have to use GGUF and llama.cpp to do mixed compute with the video card and CPU. That will probably get you up to the 2, maybe 3 T/s at the start, but I don’t know about full 4096 context.