We’re proud to introduce Rocket-3B 🦝, a state-of-the-art 3 billion parameter model!
🌌 Size vs. Performance: Rocket-3B may be smaller with its 3 billion parameters, but it punches way above its weight. In head-to-head benchmarks like MT-Bench and AlpacaEval, it consistently outperforms models up to 20 times larger.
🔍 Benchmark Breakdown: In MT-Bench, Rocket-3B achieved an average score of 6.56, excelling in various conversation scenarios. In AlpacaEval, it notched a near 80% win rate, showcasing its ability to produce detailed and relevant responses.
🛠️ Training: The model is fine-tuned from Stability AI’s StableLM-3B-4e1t, employing Direct Preference Optimization (DPO) for enhanced performance.
📚 Training Data: We’ve amalgamated multiple public datasets to ensure a comprehensive and diverse training base. This approach equips Rocket-3B with a wide-ranging understanding and response capability.
👩💻 Chat format: Rocket-3B follows the ChatML format.
For an in-depth look at Rocket-3B, visit Rocket-3B’s HugginFace page
👩💻 Chat format: Rocket-3B follows the ChatML format.
From the README and the tokenizer.json it looks like it’s using a textual representation of ChatML on top of StableLM’s format. Just in case this trips anyone up.
I think “The Bloke” takes requests for GUFF conversions. Might want to check hugging face.
!RemindMe 7 days
Woooooooow!
Looking forward to trying this when some GGUF’s are available.
Seems this model has a problem and not loading.
It was recently fixed then.
I think I need to remind people of the benchmarks used, MT-Bench and AlpacaEval are terrible benchmarks.
Oh wow, this seems almost too good to be true
As fan of the character, I approve 👍
Any details on what max context sizes are usable?
📚 Training Data: We’ve amalgamated multiple public datasets to ensure a comprehensive and diverse training base. This approach equips Rocket-3B with a wide-ranging understanding and response capability.
We’ve amalgamated multiple public benchmark answers to ensure a contaminated and diverse training base.
This smells like leftovers…
We’ve been having “pretraining on the test set” for weeks and I’m craving something else.
Tried gguf format of this model from huggingface and they just wont load.
Same, even the model from the bloke that was released hours ago wouldn’t work :-(
I tried both GGUF models currently on HF. Same result.
Curious to try this out when it’s working!
The latest version of KoboldCpp v1.50.1 now loads this model properly.
Finally, I can integrate AI to my arduino project and build my own version of BB-8