What do you think about GPT-isms polluting datasets? Do you consider them a problem? If so, how big of a problem do you think it is?

OC2608@alien.top · 2 years ago

What do you think about GPT-isms polluting datasets? Do you consider them a problem? If so, how big of a problem do you think it is?

Robot1me@alien.top · 2 years ago

What do you think about this?

I think an interesting experiment is when you edit an AI output message to start with “As an AI language model” and then let it continue the rest. If it completely loses character and just sounds like ChatGPT, it’s then quite telling.

arekku255@alien.top · 2 years ago

As an AI language model I do not have an opinion on GPT-isms polluting datasets. However it is important to remember to respect other people and work together to achieve the optimal outcome.

BackwardsPuzzleBox@alien.top · 2 years ago

The very idea of using GPT models to create datasets is such a mind-numbing, dumb incestuous decision to begin with. Essentially the 21st century version creating a xerox of a xerox.

In a lot of ways, it’s kind of heralding the future enshitification of AI as dabblers think every problem can be automated away without human judgement or editorialisation.

noeda@alien.top · 2 years ago

I think the GPT-isms maybe why my AI storywriting attempts tend to be overly positive and cliched. Not exactly a world shattering problem but it is annoying shakes fist.

I think if I thought a possible serious problem, it’s that the biases that OpenAI initially inserted into ChatGPT and their GPT models now spread around the local models as well.

It’s annoying because it feels like all models respond to questions in a similar way. Some are just a bit smarter than others or tuned to respond a bit differently.

If the GPT-like data spreads around Internet as well then it might be difficult to avoid having it in training data unless you only include old data in your training.

stereoplegic@alien.top · 2 years ago

I’m more concerned with the community’s outsized reliance on/promotion of OAI-generated datasets and models trained on them. But then, commercial viability isn’t generally a concern when you want a spicy waifu.