You will be lucky to find a supplier who does not have a long waiting list. The demand in the enterprise sector is real and I’m calling BS on any supplier having stock before Q2 2024.
You have done absolutely no research or even begun to look into the architecture and capability of the hardware you are discussing. If you have seriously been given the task to choose a hardware platform for your company then I worry for your companies future. There is a reason system architects in large organisations get paid a lot.
If you are fine tuning you MAY get away with a NVLinkd pair of H100s if running smaller models, however you will be massively nerfed for any ‘proper’ work, and certainly have no chance of training your own model.
NVLink gets a bad name. It shouldn’t. Think of it as PCIE on steroids, connecting all devices so they don’t have to touch the PCIE bandwidth. Or even more valuable, not requiring CPU cycles and instead being able to directly communicate with each other. Saving a massive amount of latency as well as general optimisation.
The SXM options are the best bet for serious work due to their interconnectivity capabilities. The PCIE devices are essentially sxm modules on a PCB with a massive power limit applied to minimise overheating or cooling issues. PCIE - 250w / SXM - 450W .
And that’s not even touching on the use of infiband or other compatible fabrics for direct compute access from connected devices ( again skipping CPU cycles and communication ). RDMA ftw.
So again, I’m calling BS. Usually I just smile and move on when reading another fantasists bs story that never turns out to result in anything. However they are becoming more and more common, especially on this sub.
If I am wrong, I apologise profusely. As stated previously, If you are honestly the member of staff that has been put in charge of a procurement decision like this then I truly feel sorry for whoever you work for.
You will be lucky to find a supplier who does not have a long waiting list. The demand in the enterprise sector is real and I’m calling BS on any supplier having stock before Q2 2024.
You have done absolutely no research or even begun to look into the architecture and capability of the hardware you are discussing. If you have seriously been given the task to choose a hardware platform for your company then I worry for your companies future. There is a reason system architects in large organisations get paid a lot.
If you are fine tuning you MAY get away with a NVLinkd pair of H100s if running smaller models, however you will be massively nerfed for any ‘proper’ work, and certainly have no chance of training your own model.
NVLink gets a bad name. It shouldn’t. Think of it as PCIE on steroids, connecting all devices so they don’t have to touch the PCIE bandwidth. Or even more valuable, not requiring CPU cycles and instead being able to directly communicate with each other. Saving a massive amount of latency as well as general optimisation.
The SXM options are the best bet for serious work due to their interconnectivity capabilities. The PCIE devices are essentially sxm modules on a PCB with a massive power limit applied to minimise overheating or cooling issues. PCIE - 250w / SXM - 450W .
And that’s not even touching on the use of infiband or other compatible fabrics for direct compute access from connected devices ( again skipping CPU cycles and communication ). RDMA ftw.
So again, I’m calling BS. Usually I just smile and move on when reading another fantasists bs story that never turns out to result in anything. However they are becoming more and more common, especially on this sub.
If I am wrong, I apologise profusely. As stated previously, If you are honestly the member of staff that has been put in charge of a procurement decision like this then I truly feel sorry for whoever you work for.