• 8 Posts
  • 61 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle

  • Yes, that M1 Max should running LLMs really well including 70B with decent context. A M2 won’t be much better. A M3, other than the 400GB/s model, won’t be as good. Since everything but the 400GB/s has had the memory bandwidth cut from the M1/M2 models.

    Are you seeing that $2400 at B&H? It was $200 cheaper there a couple of weeks ago. It might be worth it to see if the price goes back down.



  • The easiest thing to do is to get a Mac Studio. It also happens to be the best value. 3x4090s at $1600 each is $4800. That’s just for the cards. Adding a machine to put those cards into will cost another few hundred dollars. Just the cost of 3x4090s put you into Mac Ultra 128GB range. Adding the machine to put those cards into puts you in Mac Ultra 192GB range. With those 3x4090s you only have 72GB of RAM. Both those Mac options give you much more RAM.











  • Yes. I’ve done that before on my other machines. Llama.cpp in fact defaults to that. The hope for me was that since the models are sparse that the OS would cache the relevant parts of the models in RAM. So the first run through would be slow but subsequent runs would be fast since those pages are cached in RAM. How well that works or not really depends on how much RAM the OS is willing to use to cache mmap and how smartly it does it. My hope was that if it did it smarty with sparse data that it would be pretty fast. So far, my hopes haven’t been realized.







  • I just don’t login using the GUI. There indeed doesn’t seem to be a way to turn it off like in Linux. So it still uses up 10s of MB waiting for you to login. But that’s a far cry from the 100’s of MB if you do login. I have thought about killing those login in processes but the Mac is so GUI centric that if something really goes wrong and I can’t ssh in, I want that as backup. I think a few 10’s of MB is worth it for that instead of trying to fix things in the terminal in recovery mode.



  • Although I don’t doubt you, the rendering looks as fake as it gets.

    Here’s the attribution for that image, "Social media is abuzz with a screengrab of a regional webpage of the NVIDIA website purporting a “GeForce RTX 3090 CEO Edition” graphics card. "

    So tell nvidia to up their rendering game. They should know a little something about graphics or at least know someone that does.

    But is there a way to fankenstein more RAM on a existing 3090? Are there shops I could send mine to?

    Supposedly those exact frankensteins are available in China. A poster here on this sub has reported buying some. If you were in China, you could take your 3090 to any of the endless Chinese tech center booths with dudes with the skills and equipment to try to do it. I would ask if they’ve done it before though. You don’t want to be the one they learn on.