Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

vatsadev@alien.top · 2 years ago

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

watkykjynaaier@alien.top · 2 years ago

Given my M1 Max’s 400GB/s memory bandwidth, what would be the bottleneck for this on Apple Silicon? Disk speed? Is it possible to get this running on Metal?

fallingdowndizzyvr@alien.top · 2 years ago

There’s no point to it. Since if it’s too big to fit in RAM, it would be disk i/o that would be the limiter. Then it wouldn’t matter if you had 400GB/s of memory bandwidth or 40GB/s. Since the disk i/o would be the bottleneck.