This is the reason why you can’t find ones in your local best buy. They are paying premium for it. But it indeed is very helpful, if I can get my hand on a few for my build.
This is the reason why you can’t find ones in your local best buy. They are paying premium for it. But it indeed is very helpful, if I can get my hand on a few for my build.
It is not just the 4090’s getting vacuumed up. A large amount of the post crypto crash stock has been snapped up for AI use. The used market only a year ago was flooded with things like mi25’s and above that were being liquidated. Even the p40’s have started to become more expensive. When I purchased mine a year ago it was just over $150 USD, now you will struggle to find one for under $250. For some reason the AMD compute capable cards coming out of China seem to be even more scarce than the Nvidia ones. I highly suspect there is some secret AMD specific pipeline in use that has not become public knowledge right now.
People are starting to really come to grips that ‘this time is different’ when it comes to the AI boom and it is starting to really impact GPU pricing and availability. The only upside compared to the Crypto boom I guess is that with AI based use cases is that PCIe bus speeds matter and this is stopping people buying anything and everything then slapping 8 GPU’s in an AI mining rig.
Things are only going to get worse from here though, Nvidia and AMD both are too caught up in the server space right now to bother with consumer offerings that might compete. The average gamer is not going to demand more than the existing 24GB on their GPU as games simply do not need more at current resolutions. That leaves the limited workstation market and those have always come with a premium. The Pascal based Quadro cards are still selling for twice as much as a p40 and show no sign of coming down. They are not going to rush out and drop a “RTX AI Card” like they did with crypto because the server market would snap them up to drive lower speed training and inference farms.
China has a lot of used crypto GPU farms where you had racks of GPU’s chugging away at crypto crunching. How hard would it be to convert them for A.I, use?
That ‘depends’. Most of the crypto farms run on low cost motherboard/cpu combos with 8+ GPUs essentially connected via a single PCIe lane. If you wanted to do training or even inference on that, you would need to relocate those GPU’s to a more capable system and then limit the number of cards to a maximum of 4 cards per system or less. At which point if you are talking about cards with 8GB or less VRAM you have an expensive to run and set up system with 32GB VRAM and fairly low performance. That is why the higher 16GB+ cards are all disappearing.
It depends on what you do with it. I think they can be very useful. Check my post elsewhere in this thread.
https://www.reddit.com/r/LocalLLaMA/comments/183na9z/china_is_retrofitting_consumer_rtx4090s_with_2/kasawk5/
The MI25 is finally getting the love it deserves. I wish I had bought more when they were $65-$70 a few months ago. But I was hoping they would go lower. Even last month or so, I think I saw that they were $90. Right now, I just checked before posting, the seller with the most is selling them for $160. Crazy.
By the way, the one I got is in really good shape. As in really good. If the seller told me they were new, I would believe it. There’s not a speck of dust on it. Like no where and I looked deep into the fins of the heatsink. Even the fingers on the slot looked basically new.
I don’t think that’s blanket true. I think it really depends what you do with it. I can think of a couple of uses off the top of my head where 8 GPUs sitting on yanky PCIe 1x would be fine.
Use them as a team. Nothing says you can only use them to infer one large model. You can run 8 7b-13b models. One model per card. The 1x speed wouldn’t really matter in that case after the model is loaded. Having a team of small models run instead of 1 large model is a valid way to go.
Batch process 8 different prompts on a large model spread across the GPUs. Since inference is sequential, only 1 GPU is active at a time when only processing a prompt. The others 7 GPUs are idle. Don’t let them idle. Vectorize it. Process 8 or more prompts at the same time. Once the vector is full, all 8 GPUs will be running. One the t/s for any one prompt won’t be fast. The overall throughput t/s for all the prompts will be. It would be best to keep the prompts coming and thus the vector full to keep all GPUs running. So a good application for this is on a server that is inferring multiple prompts from multiple users. Or multiple prompts from the same user. Or the same prompt 8 different times. Since you can as the same model the same question 8 times and get 8 different answers. Let it process it 8 times and pick the best answer.
There are techniques that can allow for inference to be paralyzed. That may run great on a mining rig with 8 GPUs.
So it’s far from useless to repurpose an old mining rig. You just have to be creative.