Goliath-120B - quants and future plans

AlpinDale@alien.top · 2 years ago

Goliath-120B - quants and future plans

those2badguys@alien.top · 2 years ago

I’m just a lowly end user and spectator, can someone ballpark how much it’d cost to shear Goliath-120B to 70B so I can wake up and sip my coffee then spray it on my monitor and say “good lord that’s rather expensive!”

Also, how much for a 7B to 1.3B? and has it been done before? How bad is the drop in quality? I mean older 7B models are not so great to began with so the idea of seeing Mistral-7B downsized to 1.3B would be kind of fun and definitely something I want to play with.

AlpinDale@alien.top · 2 years ago

The shearing process would likely need to close to 1 billion tokens of data, so I’d guess about a few days on ~24x A100-80G/H100s. And if we get a ~50B model out of it, we’d need to train that on around ~100B tokens, which would need at least 10x H100s for a few weeks. Overall, very expensive.

And yes, princeton-nlp did a few shears of Llama2 7B/13B. It’s up on their HuggingFace.

those2badguys@alien.top · 2 years ago

Thank you kindly for the response.

a few days on ~24x A100-80G/H100s

I looked at some pricing and did some two handed 10 finger math and estimated it at 12-15 grand?

10x H100s for a few weeks

again, just looking at some retail cloud GPU renters, 20-25 grand?

I’m sure you have better things to do with your time so without doing too much on your end, how far off am I on these guesses?