Like a snake eating its own tail: What happens when AI consumes its own data?

By Hannah Chinn,
Regina G. BarberRebecca Ramirez
Published February 18, 2025 at 9:01 AM EST
In large language model collapse, there are generally three sources of errors: The model itself, the way the model is trained and the data — or lack thereof — that the model is trained on.
Andriy Onufriyenko
/
Getty Images
Asked ChatGPT anything lately? Talked with a customer service chatbot? Read the results of Google's "AI Overviews" summary feature?

If you've used the Internet lately, chances are, you've been consuming content created by a large language model.

Large language models, like DeepSeek-R1 or OpenAI's ChatGPT, are kind of like the predictive text feature in your phone on steroids. In order for them to "learn" how to write, these modesl are trained on millions of examples of human-written text.

In the past, this training usually involved having the models read the whole Internet. But nowadays — thanks in part to these large language models themselves — a lot of content on the Internet is written by generative AI.

That means that AI models trained now may consume their own synthetic content — and suffer the consequences.

This episode was produced by Hannah Chinn. It was edited by our showrunner, Rebecca Ramirez. The audio engineer was Jimmy Keeley.

Hannah Chinn
Hannah Chinn (they/them) is a producer on NPR's science podcast Short Wave. Prior to joining Short Wave, they produced Good Luck Media's inaugural "climate thriller" podcast. Before that, they worked on Spotify & Gimlet Media shows such as Conviction, How to Save a Planet and Reply All. Previous pit stops also include WHYY, as well as Willamette Week and The Philadelphia Inquirer. In between, they've worked a number of non-journalism gigs at various vintage stores, coffee shops and haunted houses.
Regina G. Barber
Regina G. Barber is Short Wave's Scientist in Residence. She contributes original reporting on STEM and guest hosts the show.
Rebecca Ramirez
Rebecca Ramirez (she/her) is the founding producer of NPR's daily science podcast, Short Wave. It's a meditation in how to be a Swiss Army Knife, in that it involves a little of everything — background research, finding and booking sources, interviewing guests, writing, cutting the tape, editing, scoring ... you get the idea.
