@RiotNrrd2001

RiotNrrd2001@alien.top · 1 year ago

I keep trying new models, and I keep going back to Dolphin-Mistral-2.2.1. There is something about the quality of the interactions that is different from the other models, and is, I don’t know, unexplainably better. I cannot identify why this remains in my mind the best model of all models I’ve tested, clear up to 33b (the largest my pitiful machine will load), but I continue to think this. Now, I haven’t tested every model, so my opinion is completely anecdotal. Dolphin just kicks it, though. It just does such a good job at almost everything I throw at it. I won’t say it doesn’t foul up here and there, but it still blows the other small models out of the water as far as I’m concerned.

RiotNrrd2001@alien.top · 1 year ago

My guess, pulled from deep within my ass, is that it is a cluster of models, many possibly in the 20b range. The results we get aren’t from a single 20b model, but from one of many models that have been “optimized” (whatever that means) for particular areas. Some router function tries to match input prompts to the best model for that prompt and then sends it to that model.

Totally making things up, here, but I can see benefits to doing it this way.

RiotNrrd2001@alien.top · 1 year ago

If you’re just getting started, then go here and get KoboldCpp. It will run quantized models without installation.

If you want the best fast models, you want mistral 7b models. There are a bunch, but my favorite is Dolphin 2.1 Mistral 7b. It screams on a potato, and it’s output is second to none.

Start up KoboldCpp, point it at the Dolphin file, and you should be good to go. I mean, there’s a tiny bit more to it than that (picking your GPU and context settings and so on, but it’s pretty easy).