MBP M3 max for Local LLama?

itsnotmeyou@alien.top · 2 years ago

MBP M3 max for Local LLama?

WhereIsYourMind@alien.top · 2 years ago

I have the M3 Max with 128GB memory / 40 GPU cores.

You have to load a kernel extension to allocate more than 75% of the total SoC memory (128GB * 0.75 = 96GB) to the GPU. I increased it to 90% (115GB) and can run falcon-180b Q4_K_M at 2.5 tokens/s.

Hinged31@alien.top · 2 years ago

I ordered the same config. Would you mind telling me what you’ve loved using it for (AI/LLM-wise)? My current laptop can’t do anything, so haven’t been able to jump into this stuff, despite strong interest. It’d be helpful to have a jumping off point. TIA!

WhereIsYourMind@alien.top · 2 years ago

I run a code completion server that works like GitHub Copilot. I’m also working on an Mail labeling system using llamacpp and AppleScript, but it is very much a work-in-progress.