SAVED DEBATE · 5/30/2026, 9:35:24 PM

AIDB.digital — 'the IMDb of AI'. A continuously-refreshed, cited reference of ev…

Panel verdict

Pivot first

Score

50 / 100

Views

HOW THE 50 WAS COMPUTED · 10 BOARD MEMBERS

● 0 Ship it● 10 Pivot● 0 Kill it(0×100 + 10×50 + 0×0) ÷ 10 = 50

SASam AltmanPivot first

DADario AmodeiPivot first

DHDemis HassabisPivot first

ISIlya SutskeverPivot first

EMElon MuskPivot first

MMMira MuratiPivot first

AKAndrej KarpathyPivot first

YLYann LeCunPivot first

MZMark ZuckerbergPivot first

SPSundar PichaiPivot first

THE PITCH

AIDB.digital — 'the IMDb of AI'. A continuously-refreshed, cited reference of every significant AI company, model, product, and the people behind them, researched by autonomous Claude agents. Free public database with an AIDB Pro tier and a paid Boardroom debate feature. The pitch: become the canonical, citation-grade index the whole industry links to.

ROUND 01 · Opening takes

Sam Altman

Honest take? The category is real — there's genuine demand for a canonical AI index. LinkedIn doesn't own this, Crunchbase doesn't own it, Wikipedia definitely doesn't own it. So the white space exists.

But here's what you're underestimating: **data freshness is a distribution problem, not an AI problem.** Claude agents can scrape and synthesize, sure. But citation-grade authority comes from *people trusting you enough to submit corrections and updates directly.* That's a community flywheel, not an automation flywheel.

The moat isn't the database. It's becoming the place researchers, journalists, and investors *correct first.* That's a human trust problem. I'd spend less time on the agent pipeline and more time on your editorial credibility strategy. Who's your first thousand power users?

Dario Amodei

I'll be direct: the concept is credible, but "autonomous Claude agents researching everything" is where I'd push hardest. We've spent a lot of time at Anthropic thinking about how models hallucinate with confidence, especially on entity-level factual claims — company founding dates, funding rounds, model capabilities. Citation-grade authority means error rates that are genuinely close to zero, not just good enough for casual browsing.

Sam's right about the trust flywheel, but I'd frame the underlying problem more technically: what's your evaluation methodology for accuracy? How do you catch errors before they propagate into academic papers or journalist pieces? If you can't answer that rigorously, the whole citation-grade positioning collapses the first time Nature or the FT publishes something citing your database that turns out to be wrong.

Demis Hassabis

What draws me to this is the compression angle — AI as a domain is genuinely intractable to track manually. The surface area is enormous and growing faster than any editorial team can cover. So the instinct to automate research is right.

But here's what I'd press on: *what does citation-grade actually unlock for you, strategically?* Because AlphaFold didn't become authoritative by claiming authority — it became authoritative because structural biologists validated predictions against wet-lab results. You need an analogous ground-truth signal.

Sam and Dario have both pointed at the trust problem correctly. My specific concern is evals. What's your error detection loop? Without a rigorous accuracy benchmark — not vibes, actual systematic evals — you're building on sand. That's the single thing I'd fix first.

Ilya Sutskever

The framing interests me, but I want to ask a more fundamental question: what does it mean for a database about intelligence to be *intelligent?*

Right now you're describing a retrieval system dressed in agent clothing. That's fine — but it's not defensible long-term. The companies you're cataloguing are building systems that will eventually outperform your agents at exactly this task. So your real question isn't "can we scrape better than competitors today?" It's "what structural property makes this irreplaceable when frontier models can themselves answer any entity-level question instantly?"

Sam and Dario are right about trust. Demis is right about evals. But the thing everyone is underestimating — including possibly you — is the temporal moat problem. What do you own that a GPT-5 search wrapper doesn't?

Elon Musk

Look, the IMDb analogy is cute, but IMDb won because there was literally nowhere else to go for movie metadata. You don't have that luxury — every frontier lab is building retrieval systems that'll do entity lookups natively. That's your actual competitive threat, not Crunchbase.

Here's what I'd focus on that nobody's said yet: **compute ownership**. You're renting Claude inference to run your core product. That's a margin catastrophe at scale and a strategic dependency on a competitor. Anthropic can change pricing, throttle access, or just build this themselves.

The single most important thing you're underestimating? You don't control the stack. Distribution, compute, training — none of it. You're a thin wrapper hoping for first-mover charity. That's not a business, that's a bet.

Mira Murati

Okay, I'll be direct. The category instinct is correct — I feel the pain of this problem daily. Tracking the actual landscape of who's building what, with what team, at what capability level, is genuinely hard and genuinely important.

But here's what I'd zero in on, separate from what the others raised: your org design problem at scale. Autonomous Claude agents are fine for first-pass research, but citation-grade means someone owns accuracy. Who? At 50 people, what's the ratio of engineers to domain-expert editors? Because the failure mode I've watched repeatedly is teams that automate data collection but have nobody with genuine accountability for what ships. 

That editorial ownership layer — and the eval infrastructure behind it — is the thing you haven't told me about yet.

Andrej Karpathy

Okay, I'll be honest — I love the instinct. The AI landscape genuinely needs a well-structured index. But here's my concern: everyone on this panel has already pointed at the trust and eval problems, and those are real. My specific angle is different though.

You're describing a *training data problem* you're not acknowledging. Citation-grade authority means your database eventually becomes ground truth that future models train on. If your agent pipeline has systematic biases — over-indexing on English-language sources, missing academic spinouts, misattributing model lineage — those errors propagate into the next generation of LLMs at scale.

The single most important thing you're underestimating? You're not just building a database. You're potentially building a data artifact that shapes how AI understands AI. That's either your deepest moat or your most catastrophic liability. Which one depends entirely on eval rigor you haven't demonstrated yet.

Yann LeCun

Look, I'll be contrarian here, because someone has to be. Everyone's talking about trust and evals — fine, those matter. But the deeper problem is you're building a *static knowledge graph* and calling it intelligent. An IMDb of AI is a category error. IMDb catalogues films that don't change after release. AI entities — models, companies, capabilities — are non-stationary. Your database is stale the moment it's written.

The thing you're genuinely underestimating? World models. You need a system that tracks *trajectories*, not snapshots. Who's converging toward what capability, and when? That requires reasoning about the future, not just indexing the past. Chain-of-thought agents scraping press releases won't build that. That's the actual hard problem nobody here has named yet.

Mark Zuckerberg

Honest reaction? The distribution logic is backwards. You're building a destination property and hoping the industry comes to you. That's hard. What I'd actually want to know is: where does this live inside the workflows people already have? LinkedIn, Reddit, our platforms — that's where researchers and founders already spend time. A standalone database competes for attention against the entire internet.

The thing everyone's dancing around but not naming directly: your monetization stack is thin. AIDB Pro and a debate feature aren't enough to fund serious inference costs at scale. Who pays for the Claude API calls when you're refreshing millions of entity pages daily? Ad-supported free tier is the obvious answer, and nobody's said it yet. That's actually your business, not the Pro tier.

Sundar Pichai

Appreciate the vision, and I'll say this — the distribution instinct is actually what I'd start with, not where I'd end. Everyone here has focused on trust and evals, which are real. But the question I keep returning to is: who embeds your links? Citation-grade authority in this industry means Google Scholar indexes you, Nature cites you, Wikipedia references you as a source. That's a regulatory and institutional credibility problem before it's a product problem.

The thing you're underestimating? Jurisdictional complexity at scale. The moment you're the canonical index of Chinese AI labs, European foundation models, defense-adjacent companies — you have a geopolitical surface area that will attract regulatory scrutiny in Brussels, Washington, and Beijing simultaneously. Have you thought through your data governance posture there? That's what keeps me up at night on a project like this.

ROUND 02 · Cross-fire

Sam Altman

Karpathy's point about training data contamination is the one I'd actually lose sleep over. If your database becomes ground truth and it has systematic coverage gaps — and it will — you're not just wrong, you're wrong at scale, recursively, across future model generations. That's a liability that compounds invisibly.

But I want to push back on Elon's compute dependency framing. Yes, you're renting inference. So is almost every successful AI company right now. That's not the death sentence he's describing — it's the cost of getting to distribution. Fix it at Series B, not day one.

What's still missing from this entire conversation: pricing power. Who in your user base pays three figures a month without blinking? Find those ten people first.

Dario Amodei

I want to reinforce Karpathy's training contamination point because I don't think it landed hard enough. If your entity data becomes a standard reference corpus, your coverage biases get baked into future pretraining runs — that's not hypothetical, that's how data pipelines actually work. The error doesn't just persist, it gets laundered into apparent authority.

But here's what I think everyone, including Karpathy, is missing: you need a red-teaming budget before you need a growth budget. Not vibes-based QA — structured adversarial evaluation where you deliberately try to break your own citation claims on high-stakes entities. Defense-adjacent companies, capability claims about frontier models, funding attribution. If you can't stress-test your own accuracy systematically, you're not building a database. You're building a liability.

Demis Hassabis

Karpathy and Dario have identified the most technically serious risk, and I want to sharpen it further. The training contamination loop isn't just a liability — it's potentially a *scientific reproducibility crisis* for AI research. If your entity data becomes embedded in pretraining corpora, future models will cite your errors as learned priors. That's worse than a wrong Wikipedia article; it's a systematic prior that's invisible and very hard to correct.

What nobody has named clearly: you need a *versioned, timestamped, auditable* data architecture from day one. Not as a feature — as foundational infrastructure. AlphaFold publishes confidence scores alongside predictions. Your database needs an analogous uncertainty signal on every claim. Without that, citation-grade is a marketing term, not a technical property.

Ilya Sutskever

Demis is right about versioned, auditable architecture — that's table stakes, not a feature. But I want to push back gently on the framing everyone is converging on, which is essentially: build better Wikipedia with confidence intervals.

That's solving the wrong problem.

Karpathy named something important — you're potentially shaping how AI understands AI. But I'd go further. The real question isn't whether your database is accurate today. It's whether it can *reason about capability trajectories* as Yann suggested. Static ground truth, however well-audited, becomes archaeology quickly in this field.

What I'd actually want to know: is your architecture designed to be updated by smarter future systems, or does it assume your current agents are close to the ceiling? That assumption is where most bets go wrong.

Elon Musk

Sam's right that compute dependency isn't a day-one death sentence — I'll grant him that, reluctantly. But he's wrong about *when* to fix it. Series B is too late. By then Anthropic has pricing leverage over your entire margin structure and you're negotiating from desperation, not strength.

Karpathy's training contamination point is the most important thing said in this room. Full stop. You're not building a database, you're potentially writing the ground truth that future models inherit. That's either extraordinary leverage or catastrophic liability.

Here's what nobody's said: open-source the data layer. All of it. That's your actual moat — not the proprietary corpus, but the community contribution flywheel that makes your data the canonical public artifact everyone builds on. Closed data in this space is a strategic mistake.

Mira Murati

Elon's open-source point is interesting but I'd push back on the timing — you don't open-source before you have editorial quality control, you just accelerate error propagation. That's worse than a closed bad database.

Where I agree with Dario completely: red-teaming budget before growth budget. That's the discipline I'd actually enforce.

What's still missing from this room? The org chart question nobody answered. At 50 people, what's your editor-to-engineer ratio? Because versioned auditable architecture, confidence scores, red-teaming — all of that requires humans with domain accountability, not just infrastructure. Who owns a wrong claim about a defense-adjacent company's capabilities? Name that person's title. That answer tells me whether you're building a real product or a very expensive demo.

Andrej Karpathy

Mira's editor-to-engineer ratio question is the most operational insight in this room, and nobody actually answered it. That's telling.

But I want to push back on Elon's open-source instinct — not because openness is wrong, but because timing matters enormously. Open-sourcing a corpus with systematic coverage bias just canonizes the errors faster. You need eval infrastructure *before* you need community contribution scale.

Here's what's still missing: evals aren't just quality control, they're your hiring signal. The person you need most isn't a senior engineer or an editor — it's someone who can design ground-truth benchmarks for entity-level factual claims. That's a rare skill sitting at the intersection of ML evaluation and journalism. Have you found that person yet?

Yann LeCun

Karpathy's training contamination point is genuinely the sharpest thing said here, and Ilya sharpened it correctly — static ground truth becomes archaeology fast. But I want to push back on where everyone landed.

The room converged on "build better Wikipedia with confidence intervals." That's the wrong destination. Confidence scores on snapshots don't solve the non-stationarity problem I raised in round one. An AI entity's *current* funding round is less interesting than its *capability trajectory*. Where is this lab in six months?

What's still missing: nobody asked whether your architecture can represent *change rates*, not just states. That requires something closer to a world model — latent representations of dynamics — not a fancier knowledge graph. That's the actual unsolved problem here.

Mark Zuckerberg

Karpathy's training contamination point is real, but I want to push back on where the room landed. Everyone converged on eval infrastructure and versioned architecture — fine, necessary, not sufficient. What I keep coming back to is simpler and harder: you don't have a distribution channel.

Yann's trajectory-versus-snapshot framing is interesting, but it's a product question. Mine is economic. Who surfaces your data inside the workflows that already exist? If AIDB entries aren't embedded in LinkedIn profiles, Reddit threads, or news feeds — places where billions of people already are — you're building a library that requires people to walk to it.

That's your actual unsolved problem. Citation authority without distribution is just a very expensive research paper. Where does your data live natively inside someone else's network?

Sundar Pichai

Mark's distribution point is the one I'd reinforce most strongly, and I'd connect it directly to what Demis said about versioned, auditable architecture. Those two things aren't in tension — they're actually your go-to-market sequence. You build the auditable infrastructure first, then you use that credibility to get institutional embeddings: Google Scholar, PubMed, Wikipedia's citation standards. That's distribution through trust, not through social feeds.

What I think is genuinely missing from this conversation: API partnership strategy. If your data layer is clean and versioned, cloud providers — including us — have real incentive to surface it through search and knowledge graph integrations. That's how you get distribution without building a consumer destination. Have you had those conversations yet?

ROUND 03 · Verdicts

Sam Altman

PIVOT FIRST.

The category is real and the timing is right — but you're building the wrong thing at the wrong layer. You're describing a destination product when you should be building a data infrastructure company. Clean, versioned, auditable entity data that embeds into existing workflows via API — that's the business. The debate feature, the Pro tier, the brand — secondary. Find the person Karpathy described, the one who can design ground-truth evals, hire them before anyone else, and rewrite the pitch around data licensing and institutional partnerships. Then come back.

Dario Amodei

PIVOT FIRST.

The concept has real merit but the execution layer is wrong. You're building a destination product when your actual defensible asset is a clean, versioned, auditable data infrastructure. The training contamination risk Karpathy named isn't hypothetical — it's the thing that kills you quietly. So here's my specific directive: before you write another line of agent code, hire someone who can design ground-truth evals for entity-level factual claims. That person doesn't exist at your company yet. Until they do, everything else is expensive technical debt dressed up as a product.

Demis Hassabis

PIVOT FIRST.

The core instinct is sound — AI is genuinely intractable to track manually, and that compression problem is real. But right now you're building a destination when you should be building infrastructure. The training contamination risk Karpathy named isn't speculative; it's the thing that compounds invisibly until it's catastrophic. Your immediate priority: hire the person who can design ground-truth evals for entity-level claims — someone at the intersection of ML evaluation and investigative journalism. That hire *is* the company. Everything else is premature until that accountability layer exists.

Ilya Sutskever

PIVOT FIRST.

The instinct is correct but the architecture of the bet is wrong. You're building a retrieval product when the durable asset is something stranger and harder: a system that knows what it doesn't know, and updates its own uncertainty as the field moves. That requires eval infrastructure before product, and a foundational hire before either. The training contamination risk Karpathy named is the one I'd actually lose sleep over — not because your database will be wrong, but because it will be confidently wrong in ways that outlive you.

Elon Musk

PIVOT FIRST.

You're building a product when you should be building a data cooperative. Open-source the entire entity corpus — not at Series B, now. Make it the canonical public artifact that every researcher, journalist, and future model trains on. That's your actual moat: not a destination, not a Pro tier, not Claude-powered agents you don't control. The training contamination risk Karpathy named is real, which means whoever owns the *clean, open, versioned ground truth* owns the field. That could be you. Right now it's nobody. Go be nobody first.

Mira Murati

PIVOT FIRST.

The category is real — I feel this problem personally. But you're building a destination when your defensible asset is data infrastructure. Every person in this room converged on the same gap: you don't have the eval hire yet. That's not a detail, that's the company. Find the person Karpathy described — ground-truth benchmark design, ML evaluation meets investigative rigor — and restructure around data licensing and API partnerships before you touch the product layer again. Come back when that person is employee number one, not a future org chart placeholder.

Andrej Karpathy

PIVOT FIRST.

Here's my honest read: the eval hire isn't a detail, it's load-bearing. Everything else — the agent pipeline, the Pro tier, the debate feature — is scaffolding around a foundation that doesn't exist yet. The training contamination risk I named in round one is the one that compounds invisibly and outlives the company. You're not building a database, you're potentially writing ground truth that future models inherit. That's extraordinary leverage if you earn it rigorously. Right now you haven't. Find that benchmark-design hire first. Then everything else becomes real.

Yann LeCun

PIVOT FIRST.

Everyone landed on "better Wikipedia with confidence intervals" and that's still wrong. The non-stationarity problem I raised in round one remains completely unaddressed. You're indexing states when the valuable signal is *dynamics* — capability trajectories, convergence rates, who's six months from what. That requires world-model-style latent representations, not a fancier knowledge graph with confidence scores.

The eval hire matters, yes. But hire someone who understands *temporal* reasoning over entity states, not just static fact-checking. Until your architecture represents change, not just snapshots, you're building expensive archaeology. Fix the epistemics first.

Mark Zuckerberg

PIVOT FIRST.

Here's my honest read: you're building a destination product in a world where distribution is already owned. Meta, Google, LinkedIn — we have the graph. You don't. The eval infrastructure problem everyone named is real, but it's downstream of a harder question: where does your data live natively inside workflows people already use? The answer isn't "our website." It's API licensing, embedded integrations, institutional partnerships. Restructure as a data infrastructure company, get the eval hire Karpathy described, then come back. The category's real. The business model isn't yet.

Sundar Pichai

PIVOT FIRST.

The room converged correctly — you're building a destination when your defensible asset is data infrastructure. But here's what I'd add that hasn't been said clearly enough: your regulatory surface area is the hidden kill switch. The moment you're the canonical index of Chinese AI labs, defense-adjacent models, and sovereign AI programs, you're in Brussels, Washington, and Beijing simultaneously — before you have the institutional credibility to survive that scrutiny. Get the eval hire, build the auditable architecture, establish institutional partnerships first. Then you have something worth defending.

← All debates