@wishtrepreneur

wishtrepreneur@alien.top · 2 years ago

You can use a general LLM to “classify” a prompt and then route the entire prompt to a downstream LLM.

why can’t you just train the “router” LLM on which downstream LLM to use and pass the activations to the downstream LLMs? Can’t you have “headless” (without encoding layer) downstream LLMs? So inference could use a (6.5B+6.5B) params model with the generalizability of a 70B model.

wishtrepreneur@alien.top · 2 years ago

It’s not like we need models that are almost as good at things computers are excellent at, while using orders of magnitude more resources.

one of the arguments for learning math (calculus, linear algebra, etc.) in school is because it supposedly helps you with critical thinking, logic reasoning, etc.

If this can be tested in LLMs then it gives weight to that proposal, because let’s face it, 99% of the population don’t use anything more complicated than exponential equations in their every day lives.