Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Covid-Plannedemic_@alien.top · 2 years ago

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

LienniTa@alien.top · 2 years ago

yeah people praising 7b and 13 b models here and there, but…they just hallucinate! Then 120b goliath, no matter how terrible its initial idea was, is just really good in normal conversations. Im trying to love giga praised open hermes 2.5 and other mistral finetunes, but they are just better next-token-predictors, unlike larger models which are actually able to reason.

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Training on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methods

Catch me if you can! How to beat GPT-4 with a 13B model | LMSYS Org