Continuing my quest to choose a rig with lots of memory, one possibility is dual socket MBs. Gen 1 to 3 EPYC chips have 8 channels of DDR4, so this gives 16 total memory channels, which is good bandwidth, if not beating GPUs, but can have way more memory (up to 1024GB). Builds with 64+ threads can be pretty cheap.
My questions are
- Does the dual CPU setup cause trouble with running LLM software?
- Is it reasonably possible to get windows and drivers etc working on ‘server’ architecture?
- Is there anything else I should consider vs going for a single EPYC or Threadripper Pro?
Dual CPUs would have terrible performance. This is because the processor is reading the whole model everytime its generating tokens and if you spread half the model onto a second CPU’s memory then the cores in the first CPU would have to read that part of the model through the slow inter-CPU link. Vice versa with the second CPU’s cores. llama.cpp would have to make a system to spread the workload across multi CPUs like they do across multi GPUs for this to work.
That’s why you have ‘numa’ option in llama.cpp.
From my experience, number of memory channels do matter a lot so this mean that all memory sockets better be filled.
There is a NUMA aware option in llama.cpp
CPUs don’t run LLMs.
Not true from what I’ve read here.
Are you trying to run Falcon 180B or something? I think it will probably work but not very well? I’d love to see you give it a try though.
When running two socket set up, you get 2 NUMA nodes. I am uncertain how llama.cpp handles NUMA but if it does handle it well, you might actually get 2x the performance thanks to the doubled total memory bandwidth. This is however quite unlikely.
You can get OK performance out of just a single socket set up. I have tried Falcon 180B Q4 GGML on my single 7773X with 512GB 8 channel 3200 RDIMM. I think it I was getting around 2 tokens/s. With a Genoa platform, you have 12 channel DDR5 5200 and AVX-512 support, it could be very usable just with 1 CPU.
I want to keep my options open, and potentially have a large context, which can add up to 100GB to memory requirements.
I’m considering 1x genoa CPU with 12 channels. Something like the 9354 would be more than enough cores. I might start with a cheaper DDR4 machine first though.
How was it getting the Epyc machine set up? Are you using windows? What about a GPU?
Setting things up was straight forward, the process is no different to building a commercial or workstation platform. The 7773X machine runs Window 10 and I have another 7452 QS machine that runs Ubuntu. Both are mostly pain free. I have EPYC boards from both Supermicro and ASRock. I find the ASRock board to be more “modern” and has a better BIOS, but Supermicro has slightly better community and official support. In the very early Naples era AMD’s BIOS had some GPU compatibility issues, but I think nowadays you can use any GPU you want.
You can get very cheap Genoa engineering samples or qualification samples off eBay so you can skip the older DDR4 platforms. Their sockets are very different, you wouldn’t even be able to reuse the heatsink.
One thing to watch out for when buying EPYCs is to definitely avoid vendor locked CPUs. Any EPYC CPUs once installed in a DELL or Lenovo board will be physically altered forever to not be able to boot on any other board. I got one once and it was a debugging nightmare until I realized the CPU was intentionally bricked by DELL…
Thanks. I can’t find ‘qualification samples’ on ebay in the UK, unless you just find them through a serial number or something.
The DDR5 ram is more expensive, but it should hold value fairly well. I’ll look for a 12 channel board.