Instead of making all these models the effort would be way more valuable if focused on making things more efficient. Methods to execute models on lower spec machines. The barrier to entry is way to big for larger models, not everyone lives in places where a 4090 is remotely an option.
I feel it’s just a lazy copout that relies on just throwing more power rather than careful optimized design like the video game industry today.
Who pays for all this training on all these models we see knocking about and I don’t mean the ones released by the big companies? Like who has the resources to train a 70b model? Like one of the guys below said 1.7 million GPU hours for example thats pretty friggin expensive no?