dqUu3QlS@alien.topBtoLocalLLaMA•An Alternative Approach to Building Generative AI ModelsEnglish
1·
1 year agoIf I understand this correctly, you’re using a smaller NN to predict the weights of a larger one? Have you tested to make sure this approach preserves the performance of the larger model? What advantage does your approach have compared to existing approaches - distillation, quantization, pruning, just training smaller models directly?
I can think of some clear disadvantages for performance.
Why do you think we need this? To me, it just indicates that the structure of Stable Diffusion is designed for real-world photos, artwork, and diagrams, and ill-suited for predicting the weights of an LLM.
Are you sure the output isn’t dramatically horrible? To me the predicted weight images look nothing like the original weight images. The fine detail is completely different.
But it doesn’t even matter how it looks to human eyes. What matters is, when a new model is constructed from the predicted weights, whether that model makes mostly-correct predictions.