I plan to infer 33B models at full precision, 70B is second priority but a nice touch. Would I be better off getting an AMD EPYC server cpu like this or a RTX 4090? With the EPYC, i am able to get 384GB DDR4 RAM for ~400USD on ebay, the 4090 only has 24GB. Moreover, both the 4090 and EPYC setup + ram cost about the same. which would be a better buy?

  • XTJ7@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    In my case it’s an Epyc 7642 with 8x64GB DDR4 2666, so that may be why my generation is significantly slower.

    I find anything below 5 tokens per second not really usable, so that’s why I stick with my M1 Ultra. It has plenty of really fast RAM and that again explains most likely why it performs so well, if LLMs are that dependend on fast memory.

    I also have a 3090 in another machine but that’s also just 24gb and I don’t want to shell out more money right now for playing with LLMs, if the M1 Ultra is doing good enough :)

    • runforpeace2021@alien.top
      cake
      B
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      If you like m1 ultra, wait until m3 ultra … . M3 max already smokes M1 Max by 3x the speed in inference.

      So expect m3 ultra to be in the 20t/s range

      • XTJ7@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        For sure! But the M1 ultra still holds up really well. I doubt I will replace it for another 3 years at the very least. Currently CPUs are progressing at an impressive rate across the board. Would I like an M3 ultra? Sure, but do I really need it? Sadly no :) The upgrade to an M5 ultra will be insane though.