Look at this, apart Llama1, all the other “base” models will likely answer “language” after “As an AI”. That means Meta, Mistral AI and 01-ai (the company that made Yi) likely trained the “base” models with GPT instruct datasets to inflate the benchmark scores and make it look like the “base” models had a lot of potential, we got duped hard on that one.
It’s almost a shame chatGPT blew up in the way that it did. “AI” became a buzzword and every company found a way to shove it into their business model. Now the future of NLP is cloudy because it’s become an ouroboros of data. I think dataset selection and cleaning will become a more important area of research. I’d be surprised if “shoving terabytes of raw webscraper data” continues being feasible in the future