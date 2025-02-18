Asked ChatGPT anything lately? Talked with a customer service chatbot? Read the results of Google's "AI Overviews" summary feature?

If you've used the Internet lately, chances are, you've been consuming content created by a large language model.

Large language models, like DeepSeek-R1 or OpenAI's ChatGPT, are kind of like the predictive text feature in your phone on steroids. In order for them to "learn" how to write, these modesl are trained on millions of examples of human-written text.

In the past, this training usually involved having the models read the whole Internet. But nowadays — thanks in part to these large language models themselves — a lot of content on the Internet is written by generative AI.

That means that AI models trained now may consume their own synthetic content — and suffer the consequences.

