have been thinking about this for a while-- does anyone know how feasible this is? Basically just applying some sort of “LoRa” on top of models to give them vision capabilities-- making then multimodal.
have been thinking about this for a while-- does anyone know how feasible this is? Basically just applying some sort of “LoRa” on top of models to give them vision capabilities-- making then multimodal.
https://www.reddit.com/r/LocalLLaMA/search/?q=LLaVA
https://www.reddit.com/r/LocalLLaMA/search/?q=vision