Is it possible to fine tune a 33B model with 48GB vRAM?

tgredditfc@alien.top · 3 years ago

Updittyupup@alien.top · 3 years ago

I think you may need to try to shard optimizer state and gradient. I’ve been using DeepSpeed and have had some good success. Here is a writeup that compares the different DeepSpeed iterations: [RWKV-infctx] DeepSpeed 2 / 3 comparisons | RWKV-InfCtx-Validation – Weights & Biases (wandb.ai). Look at the bottom of article for an accessible overview. I’m not the author, and I haven’t validated the findings. I think more distributed tools are getting more and more necessary. I suppose the option is quantization but may risk quality loss. Here is a discussion on that: https://www.reddit.com/r/LocalLLaMA/comments/153lfc2/quantization_how_much_quality_is_lost/

tgredditfc@alien.top · 3 years ago

Thank you! It looks very deep to me, I will look into it.