freehuntx@alien.topB to

LocalLLaMAEnglish · 2 years ago

Could multiple 7b models outperform 70b models?

1

Could multiple 7b models outperform 70b models?

freehuntx@alien.topB to

LocalLLaMAEnglish · 2 years ago

If i have multiple 7b models where each model is trained on one specific topic (e.g. roleplay, math, coding, history, politic…) and i have an interface which decides depending on the context which model to use. Could this outperform bigger models while being faster?

Chat

feynmanatom@alien.topB
link
fedilink
English
arrow-up
1·
2 years ago
Hmm, not sure if I track what an encoding layer is? The encoding phase involves filling the KV cache across the depth of the model. I don’t think there’s an activation you could just pass across without model surgery + additional fine tuning.