Colud LLaVa be finetuned to perform image to markdown or even image to html conversion?

tortistic_turtle@alien.top · 2 years ago

Colud LLaVa be finetuned to perform image to markdown or even image to html conversion?

herozorro@alien.top · 2 years ago

what you are looking for is OCR. then feed the LLM to the markdown

Byt3G33k@alien.top · 2 years ago

Facebook Nougat OCR model does PDF to Markdown. Also fine tuned versions of it doing PDF to LaTeX. I plan on making a fine tuned version to do PDF to XML later this winter break too!

arthurwolf@alien.top · 2 years ago

I’d really like a version of llava that can process comic/manga pages (read the text, say which character is saying what, doing what, in what order. pretty much turn the manga into a novel or something like that).

Anyone know of any project that is going in that direction/working on that?