NeuralHermes-2.5: Boosting SFT models' performance with DPO

mlabonne@alien.top · 3 years ago

NeuralHermes-2.5: Boosting SFT models' performance with DPO

actualopenai@alien.top · 3 years ago

works really well to get it on the 16k version https://huggingface.co/NurtureAI/OpenHermes-2.5-Mistral-7B-16k
would it have to be a different dataset?

Creative_Bottle_3225@alien.top · 3 years ago

what is the difference between normal and 16 K?

mlabonne@alien.top · 3 years ago

It’s a good question, I can give it a try. Ideally, you’d want a 16k version of the preference dataset to make sure that DPO doesn’t ruin it. But considering the low number of training samples, it probably works fine.