Covid-Plannedemic_@alien.topB to LocalLLaMAEnglish · 1 year agoTraining on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methodslmsys.orgexternal-linkmessage-square10fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkTraining on the rephrased test set is all you need: 13B models can reach GPT-4 performance in benchmarks with no contamination detectable by traditional methodslmsys.orgCovid-Plannedemic_@alien.topB to LocalLLaMAEnglish · 1 year agomessage-square10fedilink
minus-squareLosingID_583@alien.topBlinkfedilinkEnglisharrow-up1·1 year agoBenchmark test questions can’t be made public. It’s too easy to cheat.
Benchmark test questions can’t be made public. It’s too easy to cheat.