So Mistral-7b is a pretty impressive 7B param model … but why is it so capable? Do we have any insights into its dataset? Was it trained very far beyond the scaling limit? Any attempts at open reproductions or merges to scale up # of params?
So Mistral-7b is a pretty impressive 7B param model … but why is it so capable? Do we have any insights into its dataset? Was it trained very far beyond the scaling limit? Any attempts at open reproductions or merges to scale up # of params?
Are there notable finetunes to your knowledge? I’ve started using LLMs today, starting with openorca mistral 7B and it seems pretty good.
On HuggingFace you can find many fine-tuned/quantized models. Look for models from TheBloke on HuggingFace.