Elon Musk agrees that we’ve exhausted AI training data – TechCrunch

Latest
AI
Amazon
Apps
Biotech & Health
Climate
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Google
Government & Policy
Hardware
Instagram
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
Transportation
Venture
Events
Startup Battlefield
StrictlyVC
Newsletters
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
Elon Musk concurs with other AI experts that there’s little real-world data left to train AI models on.
“We’ve now exhausted basically the cumulative sum of human knowledge …. in AI training,” Musk said during a live-streamed conversation with Stagwell chairman Mark Penn streamed on X late Wednesday. “That happened basically last year.”
Musk, who owns AI company xAI, echoed themes former OpenAI chief scientist Ilya Sutskever touched on at NeurIPS, the machine learning conference, during an address in December. Sutskever, who said the AI industry had reached what he called “peak data,” predicted a lack of training data will force a shift away from the way models are developed today.
Indeed, Musk suggested that synthetic data — data generated by AI models themselves — is the path forward. “The only way to supplement [real-world data] is with synthetic data, where the AI creates [training data],” he said. “With synthetic data … [AI] will sort of grade itself and go through this process of self-learning.”
Other companies, including tech giants like Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train flagship AI models. Gartner estimates 60% of the data used for AI and an­a­lyt­ics projects in 2024 were syn­thet­i­cally gen­er­ated.
Microsoft’s Phi-4, which was open-sourced early Wednesday, was trained on synthetic data alongside real-world data. So were Google’s Gemma models. Anthropic used some synthetic data to develop one of its most performant systems, Claude 3.5 Sonnet. And Meta fine-tuned its most recent Llama series of models using AI-generated data.
Training on synthetic data has other advantages, like cost savings. AI startup Writer claims its Palmyra X 004 model, which was developed using almost entirely synthetic sources, cost just $700,000 to develop — compared to estimates of $4.6 million for a comparably-sized OpenAI model.
But there as disadvantages as well. Some research suggests that synthetic data can lead to model collapse, where a model becomes less “creative” — and more biased — in its outputs, eventually seriously compromising its functionality. Because models create synthetic data, if the data used to train these models has biases and limitations, their outputs will be similarly tainted. 

Topics
Senior Reporter, Enterprise
Gamers can’t fathom how 6x CEO Elon Musk found time to best Diablo IV

Edtech giant PowerSchool says hackers accessed personal data of students and teachers

Sam Altman’s family responds to lawsuit alleging he sexually assaulted his sister

Nvidia CEO says his AI chips are improving faster than Moore’s Law

Google puts $1M into 3D design app Rooms after more than 1 million ‘rooms’ created

Meticulous shows off its smart espresso machine at CES 2025

Kombu is changing the way we make our ‘booch’ with its modern kombucha brewer at CES 2025

Subscribe for the industry’s biggest tech news
Every weekday and Sunday, you can get the best of TechCrunch’s coverage.
TechCrunch's AI experts cover the latest news in the fast-moving field.
Every Monday, gets you up to speed on the latest advances in aerospace.
Startups are the core of TechCrunch, so get our best coverage delivered weekly.
By submitting your email, you agree to our Terms and Privacy Notice.
© 2024 Yahoo.

source

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top