The title, pretty much.

I’m wondering whether a 70b model quantized to 4bit would perform better than a 7b/13b/34b model at fp16. Would be great to get some insights from the community.

  • Dry-Vermicelli-682@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    44GB of GPU VRAM? WTH GPU has 44GB other than stupid expensive ones? Are average folks running $25K GPUS at home? Or those running these like working for company’s with lots of money and building small GPU servers to run these?