It’s no secret that many language models and fine-tunes are trained using datasets, many of them are made using GPT models. The problem arises when many “GPT-isms” end up in the dataset. And I am not only referring to the typical expressions like “however, it’s important to…”, “I understand your desire to…”, but I am also referring to the structure of the outputs in the model’s responses. ChatGPT (GPT models in general) tend to have a very predictable structure when in its “soulless assistant” mode, which makes it very easy to say “this is very GPT-like”.

What do you think about this? Oh, and by the way, forgive my English.

  • BackwardsPuzzleBox@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The very idea of using GPT models to create datasets is such a mind-numbing, dumb incestuous decision to begin with. Essentially the 21st century version creating a xerox of a xerox.

    In a lot of ways, it’s kind of heralding the future enshitification of AI as dabblers think every problem can be automated away without human judgement or editorialisation.