Cradawx@alien.topB to LocalLLaMAEnglish · 1 year agoShareGPT4V - New multi-modal model, improves on LLaVAsharegpt4v.github.ioexternal-linkmessage-square17fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkShareGPT4V - New multi-modal model, improves on LLaVAsharegpt4v.github.ioCradawx@alien.topB to LocalLLaMAEnglish · 1 year agomessage-square17fedilink
minus-squareM0ULINIER@alien.topBlinkfedilinkEnglisharrow-up1·1 year agohttps://preview.redd.it/vnony8f0ax1c1.png?width=1080&format=pjpg&auto=webp&s=dc261252751a0a1e209d9049854895688de25fa4 Benchmark in their GitHub, even if it’s hard to be sure in current times
minus-squarejustletmefuckinggo@alien.topBlinkfedilinkEnglisharrow-up1·1 year agoim new here. but is this true multimodality, or is it the llm communicating with a vision model? and what are those 4 models being benchmark tested here for exactly?
minus-squarelakolda@alien.topBlinkfedilinkEnglisharrow-up1·1 year agoThis isn’t comparing with the 13B version of LLAVA. I’d be curious to see that.
https://preview.redd.it/vnony8f0ax1c1.png?width=1080&format=pjpg&auto=webp&s=dc261252751a0a1e209d9049854895688de25fa4
Benchmark in their GitHub, even if it’s hard to be sure in current times
im new here. but is this true multimodality, or is it the llm communicating with a vision model?
and what are those 4 models being benchmark tested here for exactly?
This isn’t comparing with the 13B version of LLAVA. I’d be curious to see that.