The title, pretty much.
I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.
The title, pretty much.
I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.
44GB of GPU VRAM? WTH GPU has 44GB other than stupid expensive ones? Are average folks running $25K GPUS at home? Or those running these like working for company’s with lots of money and building small GPU servers to run these?
Dual 3090/4090s. Still pricey as hell, but not out of reach for some folks.
So anyone wanting to play around with this at home, has to expect to drop about 4K or so for GPUs and a setup?
I can get 2 3090 for 1200€ here on the second-hand market