Are there any data cleaning focused LLMs? [also, rant]

AnomalyNexus@alien.top · 3 years ago

andrewlapp@alien.top · 3 years ago

Say I’ve got a paragraph about something and the text block contains some other unrelated comment

Have you considered creating text embeddings, calculating their distance matrix, and applying pagerank?

AnomalyNexus@alien.top · 3 years ago

That’s a sharp comment.

Potentially beyond my technical ability but I can vaguely see where you’re going with it.

Next step was embeddings anyway (hence attempt to clean the data - get it ready for that).

I’ve not heard of pagerank applied to this before though. Thanks!