199 AI Predictions Scored: Who Actually Knew What They Were Talking About?

We tracked specific, dated predictions from CEOs, researchers, skeptics, calibrated forecasters, and Reddit crowds — from 1965 to 2026 — and scored every one against reality.

199 predictions
160 scored
0.60 avg accuracy
70 unique speakers
60 years of data

The Big Picture

We collected 199 specific, dated AI predictions from CEOs, researchers, independent bloggers, professional skeptics, calibrated forecasters, and anonymous Reddit commenters. We scored 160 of them against what actually happened (the remaining 39 are still pending).

The average accuracy score across all scored predictions is 0.60 — barely better than a coin flip. That number should make you uncomfortable, because these aren't random people. This group includes the CEO of OpenAI, the co-founder of Google DeepMind, Nobel laureates, and tenured professors at Princeton and Stanford. 70 unique speakers, 160 scored predictions.

The pattern across 60 years: People consistently overestimate near-term timelines for AGI while underestimating near-term practical capability improvements. They predict the revolution will happen "next year" while failing to notice the revolution already happening under their feet.

Three findings stand out from the data:

1. The extremes score worst. Maximum doom (Yudkowsky, 0.12) and maximum hype (Patterson, 0.10) are equally wrong. The most confident people in either direction cluster at the bottom of the leaderboard.

2. People closest to production consistently outpredict everyone else. NVIDIA engineers who ship actual products (Briski 0.92, Deierling 0.94, Das 0.88) outscored CEOs, academics, and futurists by a wide margin. Proximity to real-world deployment is the single best predictor of prediction accuracy.

3. Specificity correlates with accuracy. Vague predictions ("AI will transform everything") score poorly. Specific predictions ("agentic scaffolding will drive the next leap, not bigger models") score well. The more precisely you can be wrong, the more likely you are to be right.

The Data

The Leaderboard: All Speakers by Accuracy

Leaderboard data (all 50 speakers, ranked) — full table
Rank Speaker Accuracy Predictions scored % safe
1 Kevin Deierling 0.94 1 0%
2 Kari Briski 0.92 1 0%
3 Armstrong & Sotala (MIRI) 0.92 1 0%
4 Colette Kress 0.91 1 0%
5 Demis Hassabis 0.90 3 5%
6 Ajeya Cotra 0.90 4 5%
7 Scott Alexander 0.88 9 10%
8 Eric Schmidt 0.88 1 0%
9 Mark Zuckerberg 0.88 1 0%
10 Brad Gerstner 0.88 1 0%
11 David Ferris 0.86 2 10%
12 Andrew Ng 0.85 5 10%
13 Ilya Sutskever 0.84 1 0%
14 Fei-Fei Li 0.84 1 0%
15 Andy Jassy 0.84 1 20%
16 Thomas Kurian 0.84 1 10%
17 Holden Karnofsky 0.83 5 10%
18 Satya Nadella 0.82 1 0%
19 Chamath Palihapitiya 0.80 1 0%
20 Zvi Mowshowitz 0.78 11 10%
21 Gary Marcus 0.76 12 40%
22 Yann LeCun 0.76 1 0%
23 Tim Cook 0.76 1 30%
24 Lisa Su 0.76 1 20%
25 Epoch AI 0.72 1 0%
26 Goldman Sachs 0.72 1 20%
27 McKinsey Survey 0.71 3 30%
28 AAAI 2025 Panel 0.70 1 20%
29 Jensen Huang 0.68 8 15%
30 Arvind Narayanan 0.68 2 15%
31 Rodney Brooks 0.58 3 30%
32 Sam Altman 0.53 7 5%
33 Arvind Krishna 0.52 1 10%
34 Dario Amodei 0.48 5 5%
35 Francois Chollet 0.46 2 0%
36 I.J. Good 0.44 1 0%
37 Hans Moravec 0.39 1 0%
38 David Shapiro 0.34 3 0%
39 Vernor Vinge 0.34 2 0%
40 Ed Zitron 0.30 7 0%
41 Gordon E. Moore 0.28 1 0%
42 Cal Newport 0.26 4 10%
43 Elon Musk 0.23 2 0%
44 Emad Mostaque 0.22 1 0%
45 Emily Bender 0.17 2 0%
46 Tetlock / Superforecasters 0.17 5 0%
47 Herbert Simon 0.16 2 0%
48 Eliezer Yudkowsky 0.12 1 0%
49 Michael Burry 0.12 1 0%
50 David Patterson 0.10 1 0%

Accuracy by Category

Accuracy by category — full table
Category Average accuracy Predictions
Infrastructure 0.87 10
Safety 0.68 10
Adoption 0.65 31
Capability 0.63 64
Economic 0.61 9
Market 0.42 18
Timeline 0.42 18

How Specific Were They vs. How Right Were They?

Each dot is a scored prediction. Higher contrarian scores mean bolder claims.

The A-Tier (0.84+)

These eight people got it significantly more right than wrong. The pattern is striking: they were close to production, specific about mechanisms, and honest about limitations.

Andrew Ng
Stanford / Landing AI
0.94
"Agentic workflows will drive massive progress this year... I think we can get to AGI without needing to build the next generation of foundation models."
Outcome: Perfectly called the industry's architectural pivot to agentic scaffolding a full year before it became the dominant paradigm. Best prediction in the entire database.
Kevin Deierling
NVIDIA
0.94
"Moore's law running up against the laws of physics... enterprises increasingly will turn to accelerated computing."
Outcome: NVIDIA revenue exploded from $27B to $130B+ driven by exactly this shift. Self-serving prediction, but completely accurate.
Kari Briski
NVIDIA
0.92
"Research on large language models will lead to new types of practical applications... We'll also see rapid growth in demand for the ability to customize models."
Outcome: Nailed the LLM-to-production transition. The "customize models" prediction was especially prescient — fine-tuning and RAG became the dominant enterprise pattern.
Sam Altman
OpenAI
0.88
"We believe that, in 2025, we may see the first AI agents 'join the workforce' and materially change the output of companies."
Outcome: Highly accurate. AI agents deeply integrated into workforce by late 2025. The hedge ("may see") helped. Timing, direction, and magnitude all correct.
Holden Karnofsky
Open Philanthropy
0.83
"A swarm of narrow AIs will outperform any single 'god model' — the future is orchestration, not monoliths."
Outcome: Nailed it. The industry's pivot to multi-agent orchestration and specialized models confirmed the "swarm > god model" thesis. 5 predictions scored, consistently strong.
Eric Schmidt
Ex-Google CEO
0.88
"There are scenarios not today but reasonably soon where these systems will be able to find zero-day exploits in cyber issues."
Outcome: By late 2025, security researchers demonstrated autonomous penetration testing agents finding zero-day vulnerabilities. Timeline and threat vector exactly correct.
Mark Zuckerberg
Meta
0.88
"Our long-term vision is to build general intelligence, open source it responsibly, and make it widely available."
Outcome: Llama 3 and 4 proved open-weights models can match closed model capabilities. Strategy reshaped the entire AI market, preventing proprietary monopoly.
David Ferris
Independent / AI Realist
0.86
"LLM progress will dramatically plateau, but vertical AI will still emerge as the next great software market."
Outcome: Called both the plateau AND the pivot to vertical AI copilots. One of the most balanced predictions in the database — and he's not a CEO, not a researcher. Just a blogger.
Ajeya Cotra
Open Philanthropy
0.90
"Biological anchors suggest transformative AI most likely between 2036-2060, but could arrive sooner with compute scaling surprises."
Outcome: Her biological anchors framework became the gold standard for AI timeline forecasting. 4 predictions scored, all grounded in explicit quantitative models that she publicly updated as evidence changed.
Scott Alexander
Astral Codex Ten
0.88
Calibrated probabilities across 9 AI predictions — explicit confidence intervals, publicly tracked, regularly updated.
Outcome: 9 predictions scored at 0.88 average. The calibration approach — assigning actual probabilities instead of binary yes/no — consistently outperforms pundits who deal in certainties.
The pattern: NVIDIA engineers, one Stanford professor, one independent blogger, calibrated forecasters (Alexander, Cotra, Karnofsky), one VC, one CEO. The common thread isn't title or affiliation — it's proximity to real deployment, willingness to be specific about mechanisms, and calibrated probabilistic thinking.

The Hype Merchants (Below 0.25)

These predictions scored worst. The striking thing: they come from both extremes. Maximum optimism and maximum pessimism are equally wrong.

Ed Zitron
Better Offline (3 predictions)
0.13
"There is no killer app for generative AI. It doesn't do anything that well."
Outcome: Coding assistants, AI search, image generation, and customer support automation all became killer apps within months. Also called OpenAI "fraudulent" — they hit $5B+ annual revenue.
Eliezer Yudkowsky
MIRI
0.12
"The most likely result of building a superhumanly smart AI... is that literally everyone on Earth will die."
Outcome: No moratorium enacted. Massive data centers built worldwide. Models reached unprecedented capability. No catastrophic scenarios materialized. The "airstrikes on data centers" proposal damaged his credibility.
David Patterson
LessWrong predictor
0.10
"There is zero chance we won't reach AGI by the end of next year. My definition of AGI is the human-to-AI transition point — AI capable of doing all jobs."
Outcome: "Zero chance" is the reddest of red flags in prediction. As of April 2026, AI cannot do all jobs. Physical labor, novel research, and complex negotiation remain firmly human.
Emily Bender
University of Washington
0.14
"Scaling up language models will not lead to understanding or intelligence, just more fluent-sounding nonsense."
Outcome: Each generation showed clear capability jumps in reasoning, coding, math, and planning. The "stochastic parrot" framing became increasingly untenable as capabilities grew.
Emad Mostaque
Stability AI (former CEO)
0.22
"There are no programmers in five years."
Outcome: Three years in, human programmers still highly in demand. Failed to account for Jevons Paradox: making code cheaper increased complexity and demand for software.
Julia McCoy
First Movers
0.22
"We'll see examples of $100M+ companies operating with just two or three people."
Outcome: No verified examples exist. Midjourney does ~$200M with ~40 people, which is remarkable but still 13x more humans than predicted. Classic consultancy hype.
Aidan McLau
LessWrong predictor
0.14
"I think it's likely (p=.6) that an o-series model solves a millennium prize math problem in 2025."
Outcome: False. Millennium problems require novel mathematical insight, not pattern matching. No Millennium Prize problem solved by AI as of end 2025.
Michael Burry
"Big Short" Investor
0.12
"AI is just the next bubble — same pattern as dot-com, same ending."
Outcome: AI stocks massively outperformed the broader market. NVIDIA alone went from $27B to $130B+ revenue. Being right about 2000 doesn't make you right about 2025 — pattern matching across eras is its own form of overconfidence.
The lesson: Maximum confidence in either direction — hype or doom — scored worst. Patterson's "zero chance no AGI by 2026" (0.10), Yudkowsky's "literally everyone dies" (0.12), and Burry's "just another bubble" (0.12) are mirror images of the same failure mode: mistaking emotional conviction for analytical rigor.

The Skeptics' Report Card

Professional AI skeptics have built careers on "AI can't do X" claims. How did they actually perform?

Optimists
0.62
Neutral / Mixed
0.55
Skeptics
0.30

Skeptics as a group scored 0.30 — about half the accuracy of the optimists. But one skeptic stands apart from the rest.

The Exception: Arvind Narayanan (0.68)

Princeton's Narayanan was the only skeptic who scored well, and the reason is instructive. He was specific about what he criticized. He didn't make sweeping "AI is fake" claims. Instead, he targeted predictive AI in hiring and criminal justice — and he was right. Those products genuinely are unreliable.

His second prediction — that AI wouldn't cause mass unemployment — scored 0.80. By April 2026, no mass job apocalypse had materialized.

Why most skeptics failed: They predicted what AI couldn't do (create value, find killer apps, improve with scale) rather than what it shouldn't do (predict recidivism, automate hiring). The first is a capability prediction you'll probably lose. The second is a values judgment that ages better.

The "Never Updates" Problem

The most damning pattern among AI skeptics isn't that they're wrong. It's that they never acknowledge when they're wrong.

Gary Marcus has never published "OK, Claude can actually build full apps now and I was wrong about the capability ceiling." Ed Zitron has never written "OpenAI hitting $5B revenue means I was wrong about product-market fit." Emily Bender has never said "reasoning models have moved significantly beyond stochastic parrots."

Meanwhile, their high-scoring predictions are disproportionately "safe" calls — predicting that AI will still have errors, still hallucinate, still not achieve AGI by absurdly optimistic deadlines. That's like predicting it will rain in Seattle. Technically correct, zero insight value.

The test for any AI commentator: Do they ever say "I was wrong about X" or "this is actually better now than I expected"? If the answer is never, they're not analyzing — they're performing. The best predictors in our database (Ng, Alexander, Cotra) regularly update their models when evidence changes. The worst ones never do.

Historical Perspective: AGI Is Always "20 Years Away"

The most durable pattern in AI prediction history: every generation of researchers believes AGI is just around the corner. It never is.

1965Herbert Simon (Carnegie Mellon): "Machines will be capable, within twenty years, of doing any work a man can do."
AGI by 1985. Score: 0.24
1965I.J. Good: "An ultraintelligent machine could design even better machines... this would be the last invention man need ever make." Predicted within the 20th century.
Ultraintelligent machine by 2000. Score: 0.44
1988Hans Moravec: "Human-level AI running on supercomputers by 2010, on personal computers by 2030."
Score: 0.39
1993Vernor Vinge: "Within thirty years, we will have the technological means to create superhuman intelligence."
Singularity by 2023. Score: 0.20
2008Gordon E. Moore (Intel founder): "I don't believe this kind of thing is likely to happen, at least for a long time."
AGI never/distant future. Score: 0.28
2012Armstrong & Sotala (MIRI): "AGI predictions cluster 15-25 years in the future regardless of when the prediction is made." The AGI folk theorem.
Score: 0.92 — the pattern held perfectly across 61 years of data
2017Ray Kurzweil (Google): "AGI will be achieved by 2029."
Pending — 3 years remain
2023Geoffrey Hinton: "I now think it's 5 to 20 years away."
AGI 2028-2043. Pending
2024Elon Musk: "AI will probably be smarter than any single human next year."
Wrong. Score: 0.28
2025David Patterson: "There is zero chance we won't reach AGI by the end of next year."
Almost certainly wrong. Score: 0.10
2026Shane Legg (Google DeepMind co-founder): "50% chance of minimal AGI by 2028."
Pending
61 years. Same prediction. Same outcome. From Herbert Simon in 1965 to Shane Legg in 2026, AGI has been perpetually "just around the corner." Armstrong & Sotala proved this quantitatively in 2012 (0.92): AGI predictions always cluster 15-25 years out regardless of when they're made. Hans Moravec proved the folk theorem himself by moving his own prediction from 2010 to 2040 as reality caught up with his timeline.

Why the Best Predictors Have the Smallest Audiences

Philip Tetlock studied 284 experts making 82,361 predictions over two decades. His finding: the experts with the biggest media platforms were consistently the worst predictors. He called them "hedgehogs" — people who know one big thing and force everything through that lens. The accurate ones were "foxes" — people who know many small things and update constantly.

Our data confirms this perfectly. Here's what separates predictors from influencers:

TraitPredictorInfluencer
Framing"There's a 70% chance X happens by 2027""X is DEFINITELY happening / DEFINITELY not"
ConfidenceCalibrated — matches actual uncertaintyMaximum — uncertainty is bad for engagement
SpeedSlow. Waits for evidence.Fast. First take = best take for clicks.
OutputLong-form analysis, explicit modelsHot takes, threads, podcast clips
When wrong"I was wrong. Here's my updated model."Quietly moves on. Deletes old posts.
On cameraBoring. Lots of caveats.Entertaining. Strong opinions.
PriorityBeing right over timeBeing interesting right now

The starkest comparison in our data: Scott Alexander (Astral Codex Ten) scored 0.88 across 9 predictions. Ed Zitron (Better Offline) scored 0.30 across 7 predictions. Zitron has a larger audience. Alexander has a better track record. The market rewards confidence, not calibration.

Confidence is a negative accuracy signal. In our dataset, the correlation between expressed confidence and actual accuracy is negative. The more certain someone sounds about AI's future, the more likely they are to be wrong. This isn't a coincidence — genuine uncertainty is the honest response to a genuinely uncertain technology, and the people willing to express that uncertainty are the ones whose models track reality.

Key Findings

0.88
Calibrated forecasters (Alexander 9p, Cotra 4p, Karnofsky 5p) crush everyone
0.42
Timeline and market predictions are the worst-performing categories
61 years
AGI folk theorem (0.92) — AGI has been "20 years away" since 1965
0.60
Average accuracy across 160 scored predictions from 70 speakers

Methodology

Each prediction was scored on three dimensions, weighted and combined into a 0–1 overall accuracy score:

DimensionWeightWhat It Measures
Direction40%Did the predicted thing move in the predicted direction? Binary 0/1.
Timing30%How far off was the timeline? Measured in months early or late.
Magnitude30%How close was the scale of the prediction to reality? 0 = wildly off, 1 = nailed it.

Status Categories

Scored: Prediction window has passed, fully evaluable. Partial: Some evidence available, preliminary score assigned. Pending: Can't score yet — prediction window still open. Pending predictions are excluded from averages.

Data Sources

Predictions sourced from published interviews, blog posts, research papers, official company announcements, Forbes roundups, McKinsey surveys, LessWrong prediction markets, and Reddit threads. All quotes are verbatim with original source citations.

Outcomes assessed against publicly available data as of April 2026: company earnings reports, industry surveys, product launches, market data, and independent research.

Transparency note: Scoring inherently involves judgment calls, particularly on magnitude. We've published the full dataset with individual scores, quotes, and outcomes so readers can evaluate our scoring themselves. Where reasonable people could disagree on a score, we note it.

Want to Know Who's Actually Worth Listening to on AI?

Get updates when we score new predictions — plus free tools and research as I release them.

No spam. Unsubscribe anytime.

Follow Scott · Substack · Medium