I’ve been closely following the recent developments from NVIDIA, and their latest announcement has really caught my attention: the H200 with the new GH200 chip. This beast is said to pack a staggering 141 GB of RAM and offers a blazing 4.8 TB/s speed. The premiere of the H200 is slated for the second quarter of 2024, and I can’t help but ponder its potential impact.

The most exciting aspect for me, and probably for many of you, is its capability to run LLAMA2 70B at twice the speed of the current H100. That’s a significant leap in performance!

So here’s the big question for the community: are any of you planning to upgrade to the H200, or are you planning to stick with the H100 for a while longer?

I’m currently using the 8xH100 rig and it’s been a workhorse, but the prospect of doubling my LLAMA2 70B performance is very tempting. However, I’m also weighing the cost versus the benefits. The H200 seems like a substantial investment, and I’m wondering if the performance gain justifies the upgrade, especially considering the still-capable H100.

I’d love to hear your thoughts, experiences, and plans.

  • artelligence_consult@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    I acutally do not think so. Let’s have a look at it from various perspectives.

    • The H100 is below 100gb ram on a OCI3 form factor - the only relevant for inference - and the near 200gb versoin uses actually 2 cards. THat puts 5 of them into a 10x pcie server.
    • The AMD MI300, coming around the same timeframe, in their SGC form factor has 8 cards of near 200gb.

    So, AMD wins here, at the price of not using CODA - which may not be an issue.

    Now, performance. The 4.8TB speed are absolutely amazing. 5.2tb on AMD and end of the year totally new architectures make a joke out of that with memory integrated computing, like the DMatrix Corsair C8.

    I am not sure where NVidia - outside their ecosystem - will justify the price. Anyone who buys it - pressure to deliver may be a point - will get bitten soon.

    • 0xd00d@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I suppose the real big thing factoring into scalability isn’t necessarily CUDA, but TensorRT, which, yes is built on top of CUDA… I haven’t been keeping up with the actual hardware capabilities in AMD’s stuff wrt tensor cores, but basically what we’re seeing is TensorRT is able to better utilize nvidia’s tensor cores and extract much more out of the available memory bandwidth… if AMD can get close (it seems like we can only hope for them to get close), if they can produce significantly beefier hardware that sells for less, and the software can actually come close (this is the crux of it) then we may have some real competition