freehuntx@alien.topB to

LocalLLaMAEnglish · 2 years ago

Could multiple 7b models outperform 70b models?

1

Could multiple 7b models outperform 70b models?

freehuntx@alien.topB to

LocalLLaMAEnglish · 2 years ago

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

Chat

Cradawx@alien.topB
link
fedilink
English
arrow-up
1·
2 years ago
No, several sources include Microsoft have said GPT 3.5 Turbo is 20B. GPT 3 was 175B, and GPT 3.5 Turbo was about 10x cheaper on the API than GPT 3 when it came out so it makes sense.
- FullOf_Bad_Ideas@alien.topB
  link
  fedilink
  English
  arrow-up
  1·
  2 years ago
  Yeah if that’s the case, I can see gpt-4 requiring about 220-250B of loaded parameters to do token decoding