1.3B with 68.29% Humaneval lol, don't behead me. Part of my project PIC (partner-in-crime)

ahm_rimer@alien.top · 2 years ago

1.3B with 68.29% Humaneval lol, don't behead me. Part of my project PIC (partner-in-crime)

FullOf_Bad_Ideas@alien.top · 2 years ago

Do you plan to release the dataset? Have you checked for data contamination with benchmarks? I am overall pretty confused by scores of this model on HumanEval, not just your finetune. DeepSeek AI got very weird scaling in benchmarks, since their 6.7B model scores really closely to 33B one, which usually doesn’t work this way. 6.7B instruct scores 78.6% while 33B instruct scores 79.3%. I am now using 33B model daily at work and it’s really good. I have no evidence to support my claim, but I totally wouldn’t be surprised if they were pre-training on contaminated dataset.