I’ll go out on a limb and say that no one has compiled a glossary or encyclopedia of the various fine-tunes that seem to get published every day (if I’m wrong I’m sure someone will correct me). If you’re not connected to “the scene”, or working with these models academically/professionally, it can be hard to become and stay initiated into the “secret” jargon that’s developed around local LLM. You can pick up a lot just by hanging out here, but you’ll still run into quite a few things that make you ask “wtf does that mean?”.
Totally feasible to run LLMs at useful speeds. I’m running a 64gb 10/32 M1 Max. With LM Studio, I typically get
And this is my daily work and play machine, so I usually have all sorts of browser tabs and applications open simultaneously while running the models. From a fresh boot, it’s cool to be able to load an entire model into memory and still be able to do “normal” work without having to use any swap space at all.