which is the best model (finetuned or base) to extract structured data from a bunch of text?

sandys1@alien.top · 2 years ago

which is the best model (finetuned or base) to extract structured data from a bunch of text?

georgejrjrjr@alien.top · 2 years ago

I’ve wondered this, and hope you get better answers.

One thing you could do if it fit your use-case: align GDELT entries and news stories in realnews dataset on huggingface, train a model to output the extracted info from the article.

Another is have GPT-4 so some examples on lightly faked / anonymized data and then distill that into a model that does well on information extraction evals (which are a thing iirc).

sandys1@alien.top · 2 years ago

What is the information extraction evals ? Do u have a link ?