METHODOLOGY

How AIDB is built.

Every figure on every page has a source. Every source has a date. Every claim is labelled reported or estimated. Here's exactly how the index is researched, refreshed, and reviewed.

DISCOVERY

How a company gets into AIDB

Three pipelines feed the index. Cron discovery: weekly Haiku 4.5 agents scan facets (foundation models, vertical AI, geography) for entities not yet in the database. Editorial newsroom: the daily 13:00 UTC brief cron surfaces companies through major-outlet coverage and back-links them. Admin ingest: an admin can search any name in the header bar; if there's no match, an Sonnet 4.6 agent researches it on demand and writes a fully cited profile.

IMPORTANCE BAR

What makes the cut

A company is included if it meets at least one of:

$25M+ raised in a 2024+ round
Shipping a product or research artifact named in TechCrunch, Bloomberg, Reuters, WSJ, FT, or The Information in the last 12 months
A unique technical position (specific model, chip, vertical insight) widely cited
Has acquired or been acquired by a notable player in the space

We reject vaporware, generic SaaS dressed in AI marketing, sub-$5M projects without other signals, and dead or zombie products. The agent is instructed to return fewer entries rather than pad with weak picks.

ENRICHMENT

How a profile is filled in

Every profile is enriched by a research agent built on Claude Sonnet 4.6 with the official web search tool. The agent is given the company slug + a strict JSON schema. It searches primary sources (company sites, SEC filings, court filings) ahead of secondary commentary, parses each result, and returns a structured payload that's upserted into the database. Citations are stored alongside the run.

REFRESH

How a profile stays current

Each profile carries a freshness badge: green under 30 days, amber under 90, red older or never. Pressing Refresh on any profile re-runs the Sonnet 4.6 agent against live web search. The previous run is preserved (read-only) and the new run becomes the source of truth for the displayed fields. Every run is visible in /admin/changelog.

NEWSROOM

How the daily briefs are written

At 13:00 UTC a cron triggers /api/cron/daily-blog. A Haiku 4.5 editor agent identifies five material stories from the last 24 hours using web search. Each story is then handed to an Sonnet 4.6 writer with a separate prompt to verify against primary sources and produce a 400–600 word neutral brief. Web search citations are stored verbatim; the agent does not paraphrase a fact unless it appears in at least one returned source.

DATA LABELS

Reported vs. estimated

Every numeric field is tagged. Reported means we have a primary source or major secondary source (≤ 12 months old) backing the figure.Estimated means the number is approximate — usually triangulated from headcount × industry-typical, or from a range disclosed in press but not pinned to a single quotable figure. The tag appears on the company hero so you always know what you're looking at.

LIMITATIONS

What this is not

AIDB is editorial reference. It is not investment, legal, or business advice. The research agent can make mistakes — when it does, the cited source is visible so a reader can verify or contest. We aim for encyclopedic neutrality; if we get it wrong, the fix path is a Refresh button on the profile.

OPEN DATA

Programmatic access

A read-only JSON dump of the master index is available at /api/v1/dump. The newsroom has an RSS feed at /blog/rss.xml. A full query-and-filter API is on the roadmap. Attribution required (CC-BY-4.0).