My guess, pulled from deep within my ass, is that it is a cluster of models, many possibly in the 20b range. The results we get aren’t from a single 20b model, but from one of many models that have been “optimized” (whatever that means) for particular areas. Some router function tries to match input prompts to the best model for that prompt and then sends it to that model.
Totally making things up, here, but I can see benefits to doing it this way.
I keep trying new models, and I keep going back to Dolphin-Mistral-2.2.1. There is something about the quality of the interactions that is different from the other models, and is, I don’t know, unexplainably better. I cannot identify why this remains in my mind the best model of all models I’ve tested, clear up to 33b (the largest my pitiful machine will load), but I continue to think this. Now, I haven’t tested every model, so my opinion is completely anecdotal. Dolphin just kicks it, though. It just does such a good job at almost everything I throw at it. I won’t say it doesn’t foul up here and there, but it still blows the other small models out of the water as far as I’m concerned.