I want to buy a laptop or build a machine powerful enough that I can run these LLMs locally. I am ok with either investing in desktop or MBP though MBP was lucrative with a capability to run these models on laptop itself. Any pointers would be helpful. I tried researching but so much of info is there that I got spooked. Any initial pointers would really help. Thank you!

  • WhereIsYourMind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I have the M3 Max with 128GB memory / 40 GPU cores.

    You have to load a kernel extension to allocate more than 75% of the total SoC memory (128GB * 0.75 = 96GB) to the GPU. I increased it to 90% (115GB) and can run falcon-180b Q4_K_M at 2.5 tokens/s.

    • Hinged31@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I ordered the same config. Would you mind telling me what you’ve loved using it for (AI/LLM-wise)? My current laptop can’t do anything, so haven’t been able to jump into this stuff, despite strong interest. It’d be helpful to have a jumping off point. TIA!

      • WhereIsYourMind@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I run a code completion server that works like GitHub Copilot. I’m also working on an Mail labeling system using llamacpp and AppleScript, but it is very much a work-in-progress.