All the cases it is better than GPT-4 are benchmarks involving Chinese language. OpenAI is going to have a hard time getting access to extensive Chinese language datasets so it’s not surprising a 72B model can beat GPT-4, though it’s still impressive in it’s own right.
What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse
All the cases it is better than GPT-4 are benchmarks involving Chinese language. OpenAI is going to have a hard time getting access to extensive Chinese language datasets so it’s not surprising a 72B model can beat GPT-4, though it’s still impressive in it’s own right.