Video-LLaVA can describe both image and video input.

chibop1@alien.top · 1 year ago

Video-LLaVA can describe both image and video input.

Ok-Recognition-3177@alien.top · 1 year ago

I wonder if this could be used to identify security cam events and notify me

fallingdowndizzyvr@alien.top · 1 year ago

This is awesome. I’ve been processing individual frames of video at a time.

paryska99@alien.top · 1 year ago

Yes! I’ve been waiting for progress in video for a while! Imagine dyi automated classification for the sake of compilations and edits. This is going to be sick! Can’t wait and see an implementation on llamacpp

toothpastespiders@alien.top · 1 year ago

Holy shit. I’ve been holding off on looking too deeply into LLaVA given how many things are always popping up. But that’s just too cool to pass up on. The amount of potential applications, if it works as well as I’m hoping, is wild.

secunder73@alien.top · 1 year ago

If only we could use custom LLM models to write descriptions

fetballe@alien.top · 1 year ago

Amazing!
Does it work with .ggml or quantized 4-bit gptq?

Video-LLaVA can describe both image and video input.

Video-LLaVA can describe both image and video input.

GitHub - PKU-YuanGroup/Video-LLaVA: Video-LLaVA: Learning United Visual Representation by Alignment Before Projection