is there any ongoing effort to "bake-in" vision capabilities on top of base models or fine-tunes?

LyPreto@alien.top · 1 year ago

is there any ongoing effort to "bake-in" vision capabilities on top of base models or fine-tunes?

mcmoose1900@alien.top · 1 year ago

There’s more than one image ingestion model already. Several for llama/mistral.

If you are talking about generating images, I dunno about that. Some people hook up LLMs to prompt stable diffusion, but thats not really the same thing.

Glat0s@alien.top · 1 year ago

https://www.reddit.com/r/LocalLLaMA/search/?q=LLaVA

https://www.reddit.com/r/LocalLLaMA/search/?q=vision