AI Is Running Out of Public Data: IC3 Survey Shows Crypto’s Role in Private-Web AI

Show AI Summary

Crypto tools may address AI’s shortage of high-quality public training data through secure access.

Researchers from over a dozen institutions collaborated on the IC3 survey to explore crypto-AI integration.

Combining crypto and AI can create secure and autonomous systems with far-reaching financial consequences.

A new IC3 survey says crypto’s strongest role in artificial intelligence may not be AI agents, decentralized GPUs, or tokenized bots, but secure access to private web data — a growing priority as researchers warn that large AI models could exhaust high-quality public text data within this decade.

The survey, “Crypto x AI, AI x Crypto: A Survey”, published on June 8 by the Initiative for CryptoCurrencies and Contracts, argues that the crypto-AI sector is still in the “very early stages” of meaningful integration. But one of its most important findings is that crypto tools could help solve a growing AI problem: the shortage of high-quality public training data.

https://t.co/JZ8k7zkU1U
— IC3 (@initc3org) June 8, 2026

The report, edited by Giulia Fanti and Ari Juels, brings together more than two dozen researchers from IC3, Cornell Tech, Carnegie Mellon University, Princeton University, Yale University, ETH Zurich, Technion, Flashbots, Ritual Labs, Ava Labs, Offchain Labs, and other institutions.

Its central message is not that crypto and AI are a perfect match. In fact, the survey is unusually cautious about the hype surrounding decentralized AI infrastructure, AI agents, and crypto-based payments. Ari Juels, co-editor of the survey, described the naive combination of the two technologies as “like soldering Jell-O” in IC3’s official announcement.

They claim that when combined systematically, crypto tools can channel AI’s fluid power into secure, reliable, and highly autonomous systems. At the same time, this combination could have far reaching consequences for users and the financial system.

But the report also says crypto can play a serious role where AI systems need stronger guarantees: authenticated data, privacy-preserving computation, verifiable execution, and secure payments.

Why AI’s Public Data Problem Matters Now

Large AI models have been built on huge amounts of public web data: websites, books, code repositories, forums, academic papers, news, and social platforms. But researchers have warned that this supply is not unlimited.

A 2024 paper published in the Proceedings of Machine Learning Research, “Will we run out of data? Limits of LLM scaling based on human-generated data,” found that if current trends continue, language models could be trained on datasets roughly equal in size to the available stock of public human text data between 2026 and 2032.

The IC3 survey points to a similar pressure. It says machine learning practitioners are approaching the limits of publicly accessible World Wide Web text data, with projections estimating that such data could be exhausted between 2025 and 2030.

This matters because AI companies are facing two pressures at once. They need more high-quality data to keep improving models, but publishers, platforms, and users are increasingly restricting how that data can be collected and used.

A 2024 research paper, “Consent in Crisis: The Rapid Decline of the AI Data Commons,” found that many web sources are adding AI-specific restrictions and that 45% of C4 is restricted under Terms of Service crawling limits.

Another 2026 study, “The Impact of AI-Generated Text on the Internet,” found that by mid-2025, roughly 35% of newly published websites in its sample were classified as AI-generated or AI-assisted.

Together, these trends create a problem for AI developers. The public web is becoming harder to access, more legally contested, and increasingly filled with AI-generated content. The next valuable source of training data may not be the open internet. It may be the private web.

What Is the Private Web?

The private web refers to digital information that is not openly available for scraping. It includes emails, health records, bank statements, tax files, enterprise documents, customer databases, and other information locked behind logins or access permissions.

The IC3 survey says the private web may be two orders of magnitude larger than the surface public web. In simple terms, there may be far more useful data sitting inside private accounts and enterprise systems than on open websites.

But this data cannot simply be scraped and fed into AI models. It creates two major problems.

The first is authenticity. If a user uploads a financial document, medical record, or business file into an AI system, how can the system know the data is real and has not been modified?

The second is privacy. If private data is shared with an AI system, how can the user know it will not be leaked, stored indefinitely, misused, or exposed to the model provider?

This is the point where the IC3 report sees a real role for crypto infrastructure.

How Crypto Tools Could Help AI Access Private Data

The report highlights three major technologies: oracles, trusted execution environments, and zero-knowledge proofs.

Oracles can help verify that data came from an authentic source, such as a hospital portal, bank account, or enterprise database. In an AI setting, this means a model could receive data with stronger proof of origin.

A secure inference pipeline achieving integrity and privacy | Source:aic3

Trusted execution environments, often called TEEs, are secure computing environments designed to process data while limiting outside exposure. In theory, a user could allow data to be used for a specific AI task without handing the raw information directly to the model provider.

Zero-knowledge proofs, or ZK proofs, allow one party to prove that something is true without revealing the underlying data. In data markets, this could help sellers prove the value or quality of a dataset without giving away the dataset before payment.

That distinction matters because AI companies need more private, high-quality data, but users and enterprises need stronger guarantees before they allow sensitive information into AI systems. The question is no longer only “Can AI get more data?” It is also “Can AI get better data without breaking privacy, consent, or trust?”

The IC3 survey describes one possible architecture called “Protected Pipelines,” or Props. In this model, oracles acquire authenticated private-web data, trusted computing environments process it, and attestations prove that the pipeline followed specific rules.

Protected Pipelines (Props) Schematic | Source: aic3

For example, a medical AI system could train on patient-authorized health records while proving the records came from real providers and that the data remained private. A lending model could use a borrower’s bank statements or tax documents while giving the lender confidence that the documents were authentic, without exposing more information than necessary.

The report does not claim that every part of this process must run on a blockchain. Instead, it frames crypto as a trust layer for AI systems that need authentication, privacy, auditability, and payments.

Why This Is Bigger Than AI Agents

Much of the current crypto-AI conversation focuses on AI agents: bots that hold wallets, trade assets, pay for services, or interact with DeFi protocols.

The IC3 report does not dismiss agentic payments, but it says the crypto industry needs stronger evidence that crypto rails are better than centralized alternatives. It also warns that combining AI agents with decentralized systems could create new risks, including rogue smart contracts and systems that are difficult to shut down.

That makes the private-web data angle more important. It addresses a problem already facing AI companies: the need for reliable, consented, high-quality data beyond the public web.

In other words, the stronger crypto-AI story may not be “AI agents will run DeFi.” It may be “crypto can help AI prove where its data came from and how it was used.”

Decentralized Data Markets Could Avoid Another Big Tech Monopoly

The IC3 survey also raises the possibility of decentralized data markets, where users or organizations could provide data for AI training or inference under clear rules.

In a centralized data marketplace, one company may control access, pricing, verification, and enforcement. A decentralized model could, in theory, allow users, enterprises, and developers to participate in data markets with more transparency and stronger technical guarantees.

Crypto could support this through micropayments, programmable access rules, privacy-preserving computation, and proof systems.

But this remains an open question. The report is careful not to present decentralized data markets as a guaranteed solution. It says many design problems remain unresolved, including scalability, usability, data quality, legal compliance, and whether users will actually participate.

That caution is important. The opportunity is large, but the infrastructure is not yet proven at AI scale.

The Hype Check: What Crypto-AI Still Has to Prove

The IC3 report is also a reality check for the crypto industry. It says decentralized AI projects need more rigorous and direct cost comparisons against centralized AI platforms. The industry has shown that some decentralized AI systems are technically possible, but it has not yet shown that they are cheaper, more secure, more private, or more useful in enough real-world cases.

That difference matters for investors and builders. A decentralized GPU network may be possible. An AI agent with a wallet may be possible. A tokenized data marketplace may be possible. But the report pushes the industry to answer harder questions.

Does it reduce costs? Does it improve privacy? Does it protect users better? Does it produce better AI outcomes? Does it create new risks?

Until those questions are answered with benchmarks and real deployments, crypto-AI will remain more promising than proof.

The Real Takeaway

The most important message from the IC3 survey is not that crypto will save AI or that AI will transform every blockchain. It is that both technologies have weaknesses the other may help address.

AI is powerful but opaque, probabilistic, and hungry for data. Crypto is rigid, verifiable, and built around security guarantees. The two technologies do not automatically fit together, but when used carefully, crypto could help AI systems prove where data came from, protect private information, verify parts of training and inference, and pay data providers.

For now, the most credible crypto-AI opportunity may not be the loudest one. It may not be agents trading tokens or blockchains replacing cloud providers. It may be the infrastructure that lets AI safely use data it cannot scrape.

As AI companies search for the next wave of training data, the private web may become one of the most valuable frontiers in technology. The IC3 survey suggests crypto may have a role in opening that frontier, but only if the industry can move beyond slogans and prove that its tools work better than the systems they aim to replace.

Also Read: What is Q-Day? The Quantum Deadline for the Crypto Industry to Upgrade

Disclaimer: The information researched and reported by The Crypto Times is for informational purposes only and is not a substitute for professional financial advice. Investing in crypto assets involves significant risk due to market volatility. Always Do Your Own Research (DYOR) and consult with a qualified Financial Advisor before making any investment decisions.

Follow The Crypto Times on Google News to Stay Updated!