I’m excited to share what I’ve been working on that builds on this model. It was creative but struggled with following instructions. I was able to correct for that shortcoming with some additional merges at a low weight that seem to have preserved its creativity. The results had me really impressed last night as I did my testing.
There are several popular methods, all supported by the lovely mergekit project at https://github.com/cg123/mergekit.
The ties merge method is the newest and most advanced method. It works well because it implements some logic to minimize how much the models step on each other’s toes when you merge them together. Mergekit also makes it easy to do “frankenmerges” using the passthrough method where you interleave layers from different models in a way that extends the resultant model’s size beyond the normal limits. For example, that’s how goliath-120b was made from two 70b models merged together.