which is the best model (finetuned or base) to extract structured data from a bunch of text?

sandys1@alien.top · 2 years ago

which is the best model (finetuned or base) to extract structured data from a bunch of text?

Iamisseibelial@alien.top · 2 years ago

If sensitive why not Claude to get the baseline of what you want // examples? Since they are SOC2 // HIPAA unless you’re dealing with national security stuff you should be good to go there. And get enough examples done to train a specialized model.

sandys1@alien.top · 2 years ago

Has nothing to do with national security. It has to do with audit and compliance. Soc2 and HIPAA are not the only compliance artifacts out there. There are multiple (including cross national ones like Singapore PDP, etc).

This is why OpenAI was FORCED to offer custom model as a service.

Again, i don’t want this thread to devolve into a regulatory debate…but I have fought large extended battles in court on these topics : these things are not possible.

Iamisseibelial@alien.top · 2 years ago

Ohh that’s absolutely fair, especially when dealing with Singapore, SK or Japan. APPI AND PIPA are a pain in the ass to deal with. That said making fake versions of the data for examples is likely the best route to actually be able to train your own model then.