Point me towards some basic dataset preparation tips for LLM's?

ArtifartX@alien.top · 2 years ago

Point me towards some basic dataset preparation tips for LLM's?

Tiny_Arugula_5648@alien.top · 2 years ago

Go to huggingface and look at the multitude of datsets that have already been prepped and read whatever documentation and papers that have been published. Go through the data and get a sense of what the data looks like and how it’s structured.

ArtifartX@alien.top · 2 years ago

Yea, doing this is part of what spurred the question, because I began to notice some datasets that were very clean and ordered into data pairs, and others that seemed formatted differently, and others still that seemed like they were fed a massive chunk of unstructured text. It made me confused on if there were some sort of standards or not that I was not aware of.