• 0 Posts
  • 3 Comments
Joined 1 year ago
cake
Cake day: October 17th, 2023

help-circle
  • Totally feasible to run LLMs at useful speeds. I’m running a 64gb 10/32 M1 Max. With LM Studio, I typically get

    • 3-4 T/s using q5_k_m quants of ~70B models
    • 6-9 T/s from q5_* and q6_k quants of ~30G models
    • 25-30 T/s from q6_k and q8 quants of 7B models
    • around 20 T/s from unquantized fp16 7B models

    And this is my daily work and play machine, so I usually have all sorts of browser tabs and applications open simultaneously while running the models. From a fresh boot, it’s cool to be able to load an entire model into memory and still be able to do “normal” work without having to use any swap space at all.



  • I must be incredibly lucky, or I’m unknowingly some kind of prompting savant, because Claude, et al usually just do what I ask them to.

    The only time Claude outright refused a request was when I was looking for some criticism about a public figure of recent history as a place to begin some research. But even that was a straightforward workaround using the “I’m writing a novel based this person” stratagem.