M1/M2/M3: increase VRAM allocation with `sudo sysctl iogpu.wired_limit_mb=12345` (i.e. amount in mb to allocate)

farkinga@alien.top · 2 years ago

CheatCodesOfLife@alien.top · 2 years ago

64GB M1 Max here. Before running the command, if I tried to load up goliath-120b: (47536.00 / 49152.00) - fails

And after sudo sysctl iogpu.wired_limit_mb=57344 : (47536.00 / 57344.00)

So I guess the default is: 49152

fallingdowndizzyvr@alien.top · 2 years ago

So I guess the default is: 49152

It is. To be more clear, llama.cpp tells you want the recommendedMaxWorkingSetSize is. Which should match that number.

bebopkim1372@alien.top · 2 years ago

Maybe 47536MB is the net model size. For LLM inference, memory for context and optional context cache memory are also needed.

M1/M2/M3: increase VRAM allocation with sudo sysctl iogpu.wired_limit_mb=12345 (i.e. amount in mb to allocate)