Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind@alien.top · 2 years ago

Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind@alien.top · 2 years ago

This doesn’t seem cost-effective for what you’d get.

I agree, which is why I’m bearish on model merges, unless you’re mixing model families (IE mistral + Llama).

These franken-merges are just interweaving finetunes of the same base model in a way that, it’d make more sense to me if they just collapsed all params into a same-sized model via element-wise interpolation. So, merging weights makes sense, but running params in parallel like these X-120B, there’s no payout I can see in doing that beyond collapsing the weights.

llama_in_sunglasses@alien.top · 2 years ago

If I prompt a frankenmerge with the usual instruct dreck I use, they fail to answer numerous questions in a useful manner. However, it’s a different story using them in chat mode or probably anything creative - the outputs can be coherent but feel way less AI-like.