© 2025 WRVO Public Media
NPR News for Central New York
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

Like a snake eating its own tail: What happens when AI consumes its own data?

In large language model collapse, there are generally three sources of errors: The model itself, the way the model is trained and the data — or lack thereof — that the model is trained on.
Andriy Onufriyenko
/
Getty Images
In large language model collapse, there are generally three sources of errors: The model itself, the way the model is trained and the data — or lack thereof — that the model is trained on.

Asked ChatGPT anything lately? Talked with a customer service chatbot? Read the results of Google's "AI Overviews" summary feature?

If you've used the Internet lately, chances are, you've been consuming content created by a large language model.

Large language models, like DeepSeek-R1 or OpenAI's ChatGPT, are kind of like the predictive text feature in your phone on steroids. In order for them to "learn" how to write, these modesl are trained on millions of examples of human-written text.

In the past, this training usually involved having the models read the whole Internet. But nowadays — thanks in part to these large language models themselves — a lot of content on the Internet is written by generative AI.

That means that AI models trained now may consume their own synthetic content — and suffer the consequences.

View the AI-generated images mentioned in this episode.

Have another topic in artificial intelligence you want us to cover? Let us know my emailing shortwave@npr.org!

Listen to Short Wave on Spotify and Apple Podcasts.

Listen to every episode of Short Wave sponsor-free and support our work at NPR by signing up for Short Wave+ at plus.npr.org/shortwave.

This episode was produced by Hannah Chinn. It was edited by our showrunner, Rebecca Ramirez. The audio engineer was Jimmy Keeley.

Copyright 2025 NPR

Hannah Chinn
Hannah Chinn (they/them) is a producer on NPR's science podcast Short Wave. Prior to joining Short Wave, they produced Good Luck Media's inaugural "climate thriller" podcast. Before that, they worked on Spotify & Gimlet Media shows such as Conviction, How to Save a Planet and Reply All. Previous pit stops also include WHYY, as well as Willamette Week and The Philadelphia Inquirer. In between, they've worked a number of non-journalism gigs at various vintage stores, coffee shops and haunted houses.
Regina G. Barber
Regina G. Barber is Short Wave's Scientist in Residence. She contributes original reporting on STEM and guest hosts the show.
Rebecca Ramirez (she/her) is the founding producer of NPR's daily science podcast, Short Wave. It's a meditation in how to be a Swiss Army Knife, in that it involves a little of everything — background research, finding and booking sources, interviewing guests, writing, cutting the tape, editing, scoring ... you get the idea.