Vietnam.vn - Nền tảng quảng bá Việt Nam

Something that is about to become an Internet nostalgia

The explosion of AI content has created a trust problem, as pure human data is increasingly scarce.

ZNewsZNews09/06/2025

Purely human content is in short supply in the age of AI. Photo: Advertising Week .

The emergence of ChatGPT in 2022 will lead to an explosion of artificial intelligence content across the Internet. According to Gartner's prediction, by 2026, 90% of the content on the Internet will be generated by AI, including text, images and videos .

AI is trained to understand human thought. But without pure human-generated data, the technology will use its own old information, like a photocopier reproducing itself.

Many researchers compare human-generated original content to a modern-day “clean” steel, a rare and hard-to-find equivalent. They fear that if no one saves copies of the data before 2022, the internet will lose its purity altogether.

Historical disaster reenacted

In the post-nuclear era, scientists have discovered that all steel produced after 1945 is contaminated. Atomic bombs contaminated the atmosphere with radiation, which spread to metals produced in that environment.

This leaves much of the steel unusable for high-precision measuring equipment such as Geiger counters and other sensitive sensors. The solution is to salvage old steel from sunken warships deep at the bottom of the ocean, where it is not affected by radioactive fallout.

For AI developers, most models are trained on vast troves of human data collected on the internet. But if today’s software learns from text it has generated in the past, the models risk falling apart, diluting their originality and depth.

Noi dung dang tin cay anh 1

The sunken World War I battleship Hindenburg has been salvaged. Photo: Reuters Connect.

That makes human-generated content, especially before 2022, more valuable, says Will Allen, vice president of Cloudflare, which operates one of the world’s largest internet networks. He says it helps anchor AI models, and society as a whole, to a common reality. Without that foundation, things get complicated.

Background is especially important in highly technical fields like medicine, law, or tax. For example, a doctor should rely on content written by human experts and real research, not from sources generated by AI.

This danger is also becoming more real. A year after ChatGPT launched, venture capitalist Paul Graham said he had to search for older content for a simple search to avoid “AI-generated SEO bait.” Malte Ubl, CTO of AI startup Vercel, responded that Graham was essentially filtering the internet for content “before it was contaminated by AI.”

Matt Rickard, a former Google engineer, agrees. He wrote in a blog post from 2023 that AI feeds off the internet, but more and more of the content on it is AI-generated. “The output of chatbots is hard to detect. Finding training data that hasn’t been tampered with by AI will become increasingly difficult,” Rickard explained.

The "seabed steel search"

The answer to that problem lies in preserving the human-generated version of data before the AI ​​boom. One of the pioneers is John Graham-Cumming, a board member and CTO of Cloudflare.

His project, the website LowBackgroundSteel.ai, catalogs datasets, links, and media that existed before 2022. One example given is GitHub's Arctic Code Vault, an open-source software repository buried in an abandoned coal mine in Norway, kept since February 2020.

Noi dung dang tin cay anh 2

Graham-Cumming's Human Data Preservation Project. Photo: Lowbackgroundsteel.ai.

Another data source he lists is “wordfreq,” a project that tracks how often words are used online. Linguist Robyn Speer has maintained it until 2021.

“Generative AI has polluted the data,” Speer said. She cited ChatGPT’s obsession with the word “delve,” which has been popping up more and more recently, as an example. This skews the data on the internet, making it less reliable as a reflection of how humans write and think.

AI models trained in part on synthetic content can speed up workflows and eliminate boredom in creative work. But beyond performance, users will likely have to rely on human-generated content to make accurate judgments, like using “low-ground steel” for accurate measurements.

Scientists have developed different methods for making steel using pure oxygen, a reminder that preserving the past may be the only way to build a reliable future, according to Business Insider .

Source: https://znews.vn/thu-sap-thanh-hoai-niem-tren-internet-post1559151.html


Comment (0)

No data
No data

Heritage

Figure

Enterprise

No videos available

News

Political System

Destination

Product