anyone knows some robust open source library for extracting tables from pdf , even ocr library is fine
P.S- i have already tried tabula ,camelot , ing2table, unstructured.io and most of the document loader in langchain , none of them are even 95% robust
Are your pdf random documents from users ? If yes then it will a problem since there can be many ways the pdfs are structured depending on whatever tool was used. If all the pdfs are the same, like created by the same tool then maybe you have a chance, I would inspect the pdf layout and see if there are consistent and then maybe with a pdf library you can get the data (maybe you could use parts of pdf.js from mozila)