Dear Model Mergers, Have You Solved Merger of Different Model Families?

BayesMind@alien.top · 3 years ago

Dear Model Mergers, Have You Solved Merger of Different Model Families?

FullOf_Bad_Ideas@alien.top · 3 years ago

I’ve only seen merging of same-upstream-pretrained-model-at-same-size.

Not anymore.

Here’s a merge of llama 2 13B and llama 1 33B https://huggingface.co/chargoddard/llama2-22b

30299578815310@alien.top · 3 years ago

How does this work? Like I’m really confused at a conceptual level on how you merge models with different numbers of different sized layers.

llama_in_sunglasses@alien.top · 3 years ago

https://huggingface.co/chargoddard/llama2-22b/blob/main/frankenllama_22b.py shows how the tensors are padded up to fit

BayesMind@alien.top · 3 years ago

reading the readme, it sounds like they’re running some attention heads that were either already same-dimensioned across both models, or, they may have included a linear projection layer to accomplish it. Then, they say they trained on 10M tokens to “settle in the transplant”, which doesn’t sound like enough to me, and they concur this model isn’t useful until further training.