I want to buy a laptop or build a machine powerful enough that I can run these LLMs locally. I am ok with either investing in desktop or MBP though MBP was lucrative with a capability to run these models on laptop itself. Any pointers would be helpful. I tried researching but so much of info is there that I got spooked. Any initial pointers would really help. Thank you!
I have the M3 Max with 128GB memory / 40 GPU cores.
You have to load a kernel extension to allocate more than 75% of the total SoC memory (128GB * 0.75 = 96GB) to the GPU. I increased it to 90% (115GB) and can run falcon-180b Q4_K_M at 2.5 tokens/s.
I ordered the same config. Would you mind telling me what you’ve loved using it for (AI/LLM-wise)? My current laptop can’t do anything, so haven’t been able to jump into this stuff, despite strong interest. It’d be helpful to have a jumping off point. TIA!
I run a code completion server that works like GitHub Copilot. I’m also working on an Mail labeling system using llamacpp and AppleScript, but it is very much a work-in-progress.