199 AI Predictions Scored: Who Actually Knew What?

The Big Picture

We collected 199 specific, dated AI predictions from CEOs, researchers, independent bloggers, professional skeptics, calibrated forecasters, and anonymous Reddit commenters. We scored 160 of them against what actually happened (the remaining 39 are still pending).

The average accuracy score across all scored predictions is 0.60 — barely better than a coin flip. That number should make you uncomfortable, because these aren't random people. This group includes the CEO of OpenAI, the co-founder of Google DeepMind, Nobel laureates, and tenured professors at Princeton and Stanford. 70 unique speakers, 160 scored predictions.

The pattern across 60 years: People consistently overestimate near-term timelines for AGI while underestimating near-term practical capability improvements. They predict the revolution will happen "next year" while failing to notice the revolution already happening under their feet.

Three findings stand out from the data:

1. The extremes score worst. Maximum doom (Yudkowsky, 0.12) and maximum hype (Patterson, 0.10) are equally wrong. The most confident people in either direction cluster at the bottom of the leaderboard.

2. People closest to production consistently outpredict everyone else. NVIDIA engineers who ship actual products (Briski 0.92, Deierling 0.94, Das 0.88) outscored CEOs, academics, and futurists by a wide margin. Proximity to real-world deployment is the single best predictor of prediction accuracy.

3. Specificity correlates with accuracy. Vague predictions ("AI will transform everything") score poorly. Specific predictions ("agentic scaffolding will drive the next leap, not bigger models") score well. The more precisely you can be wrong, the more likely you are to be right.

The Data

The Leaderboard: All Speakers by Accuracy

Leaderboard data (all 50 speakers, ranked) — full table

Rank	Speaker	Accuracy	Predictions scored	% safe
1	Kevin Deierling	0.94	1	0%
2	Kari Briski	0.92	1	0%
3	Armstrong & Sotala (MIRI)	0.92	1	0%
4	Colette Kress	0.91	1	0%
5	Demis Hassabis	0.90	3	5%
6	Ajeya Cotra	0.90	4	5%
7	Scott Alexander	0.88	9	10%
8	Eric Schmidt	0.88	1	0%
9	Mark Zuckerberg	0.88	1	0%
10	Brad Gerstner	0.88	1	0%
11	David Ferris	0.86	2	10%
12	Andrew Ng	0.85	5	10%
13	Ilya Sutskever	0.84	1	0%
14	Fei-Fei Li	0.84	1	0%
15	Andy Jassy	0.84	1	20%
16	Thomas Kurian	0.84	1	10%
17	Holden Karnofsky	0.83	5	10%
18	Satya Nadella	0.82	1	0%
19	Chamath Palihapitiya	0.80	1	0%
20	Zvi Mowshowitz	0.78	11	10%
21	Gary Marcus	0.76	12	40%
22	Yann LeCun	0.76	1	0%
23	Tim Cook	0.76	1	30%
24	Lisa Su	0.76	1	20%
25	Epoch AI	0.72	1	0%
26	Goldman Sachs	0.72	1	20%
27	McKinsey Survey	0.71	3	30%
28	AAAI 2025 Panel	0.70	1	20%
29	Jensen Huang	0.68	8	15%
30	Arvind Narayanan	0.68	2	15%
31	Rodney Brooks	0.58	3	30%
32	Sam Altman	0.53	7	5%
33	Arvind Krishna	0.52	1	10%
34	Dario Amodei	0.48	5	5%
35	Francois Chollet	0.46	2	0%
36	I.J. Good	0.44	1	0%
37	Hans Moravec	0.39	1	0%
38	David Shapiro	0.34	3	0%
39	Vernor Vinge	0.34	2	0%
40	Ed Zitron	0.30	7	0%
41	Gordon E. Moore	0.28	1	0%
42	Cal Newport	0.26	4	10%
43	Elon Musk	0.23	2	0%
44	Emad Mostaque	0.22	1	0%
45	Emily Bender	0.17	2	0%
46	Tetlock / Superforecasters	0.17	5	0%
47	Herbert Simon	0.16	2	0%
48	Eliezer Yudkowsky	0.12	1	0%
49	Michael Burry	0.12	1	0%
50	David Patterson	0.10	1	0%

Accuracy by Category

Accuracy by category — full table

Category	Average accuracy	Predictions
Infrastructure	0.87	10
Safety	0.68	10
Adoption	0.65	31
Capability	0.63	64
Economic	0.61	9
Market	0.42	18
Timeline	0.42	18

How Specific Were They vs. How Right Were They?

Each dot is a scored prediction. Higher contrarian scores mean bolder claims.

The A-Tier (0.84+)

These eight people got it significantly more right than wrong. The pattern is striking: they were close to production, specific about mechanisms, and honest about limitations.

The pattern: NVIDIA engineers, one Stanford professor, one independent blogger, calibrated forecasters (Alexander, Cotra, Karnofsky), one VC, one CEO. The common thread isn't title or affiliation — it's proximity to real deployment, willingness to be specific about mechanisms, and calibrated probabilistic thinking.

The Hype Merchants (Below 0.25)

These predictions scored worst. The striking thing: they come from both extremes. Maximum optimism and maximum pessimism are equally wrong.

The lesson: Maximum confidence in either direction — hype or doom — scored worst. Patterson's "zero chance no AGI by 2026" (0.10), Yudkowsky's "literally everyone dies" (0.12), and Burry's "just another bubble" (0.12) are mirror images of the same failure mode: mistaking emotional conviction for analytical rigor.

The Skeptics' Report Card

Professional AI skeptics have built careers on "AI can't do X" claims. How did they actually perform?

Optimists

0.62

Neutral / Mixed

0.55

Skeptics

0.30

Skeptics as a group scored 0.30 — about half the accuracy of the optimists. But one skeptic stands apart from the rest.

The Exception: Arvind Narayanan (0.68)

Princeton's Narayanan was the only skeptic who scored well, and the reason is instructive. He was specific about what he criticized. He didn't make sweeping "AI is fake" claims. Instead, he targeted predictive AI in hiring and criminal justice — and he was right. Those products genuinely are unreliable.

His second prediction — that AI wouldn't cause mass unemployment — scored 0.80. By April 2026, no mass job apocalypse had materialized.

Why most skeptics failed: They predicted what AI couldn't do (create value, find killer apps, improve with scale) rather than what it shouldn't do (predict recidivism, automate hiring). The first is a capability prediction you'll probably lose. The second is a values judgment that ages better.

The "Never Updates" Problem

The most damning pattern among AI skeptics isn't that they're wrong. It's that they never acknowledge when they're wrong.

Gary Marcus has never published "OK, Claude can actually build full apps now and I was wrong about the capability ceiling." Ed Zitron has never written "OpenAI hitting $5B revenue means I was wrong about product-market fit." Emily Bender has never said "reasoning models have moved significantly beyond stochastic parrots."

Meanwhile, their high-scoring predictions are disproportionately "safe" calls — predicting that AI will still have errors, still hallucinate, still not achieve AGI by absurdly optimistic deadlines. That's like predicting it will rain in Seattle. Technically correct, zero insight value.

The test for any AI commentator: Do they ever say "I was wrong about X" or "this is actually better now than I expected"? If the answer is never, they're not analyzing — they're performing. The best predictors in our database (Ng, Alexander, Cotra) regularly update their models when evidence changes. The worst ones never do.

Historical Perspective: AGI Is Always "20 Years Away"

The most durable pattern in AI prediction history: every generation of researchers believes AGI is just around the corner. It never is.

1965 — Herbert Simon (Carnegie Mellon): "Machines will be capable, within twenty years, of doing any work a man can do."

AGI by 1985. Score: 0.24

1965 — I.J. Good: "An ultraintelligent machine could design even better machines... this would be the last invention man need ever make." Predicted within the 20th century.

Ultraintelligent machine by 2000. Score: 0.44

1988 — Hans Moravec: "Human-level AI running on supercomputers by 2010, on personal computers by 2030."

Score: 0.39

1993 — Vernor Vinge: "Within thirty years, we will have the technological means to create superhuman intelligence."

Singularity by 2023. Score: 0.20

2008 — Gordon E. Moore (Intel founder): "I don't believe this kind of thing is likely to happen, at least for a long time."

AGI never/distant future. Score: 0.28

2012 — Armstrong & Sotala (MIRI): "AGI predictions cluster 15-25 years in the future regardless of when the prediction is made." The AGI folk theorem.

Score: 0.92 — the pattern held perfectly across 61 years of data

2017 — Ray Kurzweil (Google): "AGI will be achieved by 2029."

Pending — 3 years remain

2023 — Geoffrey Hinton: "I now think it's 5 to 20 years away."

AGI 2028-2043. Pending

2024 — Elon Musk: "AI will probably be smarter than any single human next year."

Wrong. Score: 0.28

2025 — David Patterson: "There is zero chance we won't reach AGI by the end of next year."

Almost certainly wrong. Score: 0.10

2026 — Shane Legg (Google DeepMind co-founder): "50% chance of minimal AGI by 2028."

Pending

61 years. Same prediction. Same outcome. From Herbert Simon in 1965 to Shane Legg in 2026, AGI has been perpetually "just around the corner." Armstrong & Sotala proved this quantitatively in 2012 (0.92): AGI predictions always cluster 15-25 years out regardless of when they're made. Hans Moravec proved the folk theorem himself by moving his own prediction from 2010 to 2040 as reality caught up with his timeline.

Why the Best Predictors Have the Smallest Audiences

Philip Tetlock studied 284 experts making 82,361 predictions over two decades. His finding: the experts with the biggest media platforms were consistently the worst predictors. He called them "hedgehogs" — people who know one big thing and force everything through that lens. The accurate ones were "foxes" — people who know many small things and update constantly.

Our data confirms this perfectly. Here's what separates predictors from influencers:

Trait	Predictor	Influencer
Framing	"There's a 70% chance X happens by 2027"	"X is DEFINITELY happening / DEFINITELY not"
Confidence	Calibrated — matches actual uncertainty	Maximum — uncertainty is bad for engagement
Speed	Slow. Waits for evidence.	Fast. First take = best take for clicks.
Output	Long-form analysis, explicit models	Hot takes, threads, podcast clips
When wrong	"I was wrong. Here's my updated model."	Quietly moves on. Deletes old posts.
On camera	Boring. Lots of caveats.	Entertaining. Strong opinions.
Priority	Being right over time	Being interesting right now

The starkest comparison in our data: Scott Alexander (Astral Codex Ten) scored 0.88 across 9 predictions. Ed Zitron (Better Offline) scored 0.30 across 7 predictions. Zitron has a larger audience. Alexander has a better track record. The market rewards confidence, not calibration.

Confidence is a negative accuracy signal. In our dataset, the correlation between expressed confidence and actual accuracy is negative. The more certain someone sounds about AI's future, the more likely they are to be wrong. This isn't a coincidence — genuine uncertainty is the honest response to a genuinely uncertain technology, and the people willing to express that uncertainty are the ones whose models track reality.

Key Findings

0.88

Calibrated forecasters (Alexander 9p, Cotra 4p, Karnofsky 5p) crush everyone

0.42

Timeline and market predictions are the worst-performing categories

61 years

AGI folk theorem (0.92) — AGI has been "20 years away" since 1965

0.60

Average accuracy across 160 scored predictions from 70 speakers

Methodology

Each prediction was scored on three dimensions, weighted and combined into a 0–1 overall accuracy score:

Dimension	Weight	What It Measures
Direction	40%	Did the predicted thing move in the predicted direction? Binary 0/1.
Timing	30%	How far off was the timeline? Measured in months early or late.
Magnitude	30%	How close was the scale of the prediction to reality? 0 = wildly off, 1 = nailed it.

Status Categories

Scored: Prediction window has passed, fully evaluable. Partial: Some evidence available, preliminary score assigned. Pending: Can't score yet — prediction window still open. Pending predictions are excluded from averages.

Data Sources

Predictions sourced from published interviews, blog posts, research papers, official company announcements, Forbes roundups, McKinsey surveys, LessWrong prediction markets, and Reddit threads. All quotes are verbatim with original source citations.

Outcomes assessed against publicly available data as of April 2026: company earnings reports, industry surveys, product launches, market data, and independent research.

Transparency note: Scoring inherently involves judgment calls, particularly on magnitude. We've published the full dataset with individual scores, quotes, and outcomes so readers can evaluate our scoring themselves. Where reasonable people could disagree on a score, we note it.

199 AI Predictions Scored: Who Actually Knew What They Were Talking About?

The Big Picture

The Data

The Leaderboard: All Speakers by Accuracy

Accuracy by Category

How Specific Were They vs. How Right Were They?

The A-Tier (0.84+)

The Hype Merchants (Below 0.25)

The Skeptics' Report Card

The Exception: Arvind Narayanan (0.68)

The "Never Updates" Problem

Historical Perspective: AGI Is Always "20 Years Away"

Why the Best Predictors Have the Smallest Audiences

Key Findings

Methodology

Status Categories

Data Sources

Want to Know Who's Actually Worth Listening to on AI?

Related on scovert.com