Clearing up confusion: GPT 3.5-Turbo may not be 20b after all

SomeOddCodeGuy@alien.top · 2 years ago

Clearing up confusion: GPT 3.5-Turbo may not be 20b after all

Auto_Luke@alien.top · 2 years ago

Try to experience the last best models under 20 billion parameters (Mistral, Qwen). Then be aware that the training set of those models is much smaller and less optimized than that of 3.5-turbo (I assume that the current version of 3.5-turbo is using above 10 trillion tokens of partially synthetic data). Also, I do not feel like 3.5-turbo is so good, to be honest. It’s realistic for it to be in this size range. I think that, with a maximally optimized latent space, it is possible to achieve similar results with around 10 billion parameters.

Monkey_1505@alien.top · 2 years ago

I tend to disagree that it’s less optimized. Generally more data, and more compute reduces the need for heavy data refinement, whereas smaller models with less available compute benefit more.

Auto_Luke@alien.top · 2 years ago

It’s very true that a small amount of high-quality data is better than a lot of garbage, but even better would be a large amount of high-quality data optimized in a way that we haven’t figured out yet. However, openai could be even one year ahead. Unfortunately, it is closedai now.

Monkey_1505@alien.top · 2 years ago

That’s true, but they still have less impetus to do that. They are being fairly heavily subsidized by microsoft so running costs and compute isn’t much of a concern. It’s only really at the point where more data, and more compute hits a wall, where they have to worry too much about data refinement.