• 0 Posts
  • 4 Comments
Joined 1 year ago
cake
Cake day: October 27th, 2023

help-circle
  • It’s important that we not disclose all our test questions, or models will continue to overfit and underlearn. Now, to answer your question:

    When evaluating a code model, I look for questions with easy answers, then tweak them slightly to see if the model gives the easy answer or figures out that I need something else. I’ll give one example out of tens*:

    “Write a program that removes the first 1 KiB of a file.”

    Most of the models I’ve tested will give a correct answer to the wrong question: seek(1024) and truncate(). That removes everything after the first 1 KiB of the file.

    (*I’m being deliberately vague about how many questions I have for the same reason I don’t share them. Also it’s a moving target.)