https://www.amazon.se/-/en/NVIDIA-Tesla-V100-16GB-Express/dp/B076P84525 price in my country: 81000SEK or 7758,17 USD
My current setup:
NVIDIA GeForce RTX 4050 Laptop GPU
cuda cores: 2560
memory data rate 16.00 Gbps
My laptop GPU works fine for most ML and DL tasks. I am currently finetuning a GPT-2 model with some data that I scraped. And it worked surprisingly well on my current setup. So it’s not like I am complaining.
I do however own a stationary PC with some old GTX 980 GPU. And was thinking of replacing that with the V100.
So my question to this community is: For those of you who have bought your own super-duper-GPU. Was it worth it. And what was your experience and realizations when you started tinkering with it?
Note: Please refrain giving me snarky comments about using Cloud GPU’s. I am not interested in that (And I am in fact already using one for another ML task that doesn’t involve finetuning) . I am interested to hear about the some hardware hobbyists opinion on this matter.
No. V100 is not ampere architecture and for that price is simply not worth. 3090 is cheaper and has 24 gb
If you want 16gb check out the A4000. They’re usually not that expensive and better cores
I’d love a V100 but they go for stupid prices where 3090s and a whole host of other cards make more sense. I think even RTX 8000 is cheaper and has more ram/is newer.
ye im with ya on that multiple 3090’s are the go unless your working massive models I think.
A V100 16GB is like $700 on ebay. RTX 3090 24GB can be had for a similar amount.
Exactly which has me wondering why 3090 24g isn’t mentioned more on this sub. Isn’t that actually the best option. multiple of those
Don’t buy the v100 at amazon.se - that price is crazy high
I say first use services like Lambda when you need the extra processing power. Then only buy the hardware when it genuinely would be a savings to buy the hardware and train locally.
Also, consumer GPUs / memory bandwidth are quickly exceeded as you want to work on larger and larger models. If you buy early you may quickly find that it is inadequate for your needs.
Why the hell would you get a 2 gen old 16 GB GPU for 7.7K when you can get 3-4 4090s, each will rofl stomp it ANY use case, let alone running 3.
Get either an A6000 (Ampere 48GB card), A6000 ADA, 3 4090s and the a AMD TR system with it or something like that. It will still run laps around the V100 and be cheaper.
This. I was so confused when I saw op’s post: why on earth to buy an old only 16gb vram card with the price of multiple larger vram and newer cards?
-
You want VRAM, like lots of folks have mentioned; there’s some non-obvious things here - you can make smaller VRAM work w/ reduced batch size or non-AdamW optimizers, but you trade off both speed and quality to do so.
-
You can split training across multiple GPUs; I use 2x 3060 12gb, though a real 24gb card would be better.
-
I don’t recommend a V100 - you’d miss out on the bfloat16 datatype.
-
I dug into this a lot back when I was building 2 AI servers for home use, for both inference and training. Dual 4090’s are the best you can get for speed at a reasonable price. But for the best “bang for your buck” you can’t beat used 3090’s. You can pick them up reliably for $750-800 each off of Ebay.
I went with dual 3090’s using this build: https://pcpartpicker.com/list/V276JM
I also went with NVLink which was a waste of money. It doesn’t really speed things up as the board can already do x8 PCI on dual cards.
But a single 3090 is a great card you can do a lot with. If that’s too much money, go with a 3060 12gb card. The server oriented stuff is a waste for home use. Nvidia 30xx and 40xx series consumer cards will just blow them away in a home environment.
I am going to create Jarvis: https://pcpartpicker.com/list/yjVbCd
Be careful with your motherboard choices if you’re running 2 video cards. Many boards are only really designed to support 1x video card at x8 or x16 PCI speeds.
I can’t corraborate results for Pascal cards. They had very limited FP16 performance, usually 1:64 of FP32 performance. Switching over to rtx 3090 ti from gtx 1080 got me around 10-20x gains in qlora training, assuming keeping the exact same batch size and ctx length, changing only calculations from fp16 to bf16.
I’m not sure where this chart is from, but I remember it was made before qlora even existed.
Is there any such benchmark that includes both the 4090/A100 and a mac with M2 Ultra / M3 Max? I’ve searched quite a bit but didn’t find anyone comparing them on similar setups, it seems very interesting due to the large unified memory.
So basically either 4090 or H100
Yeah, perhaps If I am crazy enough I could just buy 3 of those and call it a day
A6000 being worse than 3090 doesn’t make any sense.
Man those h100s really are on another level. I shudder to think where are in 5 years.