If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

  • FullOf_Bad_Ideas@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Yeah if that’s the case, I can see gpt-4 requiring about 220-250B of loaded parameters to do token decoding