Miami-based AI lab Subquadratic has emerged from stealth with a $29 million seed round and SubQ, a large language model (LLM) the company says is built on a fully sub-quadratic sparse-attention architecture rather than a standard transformer. <cite index="2-2,2-3">The company came out of stealth on May 5, 2026, claiming to have built the first frontier LLM that does not rely on quadratic attention, with a 12 million token context window and roughly a fifth of the cost of comparable workloads on Claude Opus or GPT-5.5.</cite>
Funding and backers
<cite index="2-10">The $29 million seed round included Justin Mateen (Tinder co-founder), Javier Villamizar (formerly of SoftBank Vision Fund), and early investors in Anthropic, OpenAI, Stripe, and Brex.</cite> <cite index="6-9,6-10">Secondary reporting placed the round at a roughly $500 million valuation, with participation from the JAM Fund alongside angel investors.</cite> <cite index="5-18,5-19">The 35-person company reports 11 PhDs from Meta, Google, Oxford, Cambridge, and BYU, and is led by CEO Justin Dangel, described as a five-time founder, and CTO Alex Whedon, previously head of generative AI at Meta.</cite>
Architecture
Standard transformer attention scales quadratically — O(n²) — with sequence length, making very long contexts prohibitively expensive. <cite index="9-16,9-17">Subquadratic Sparse Attention (SSA) is a content-dependent sparse attention mechanism in which the model learns to dynamically select only the relevant positions for each query token and performs exact attention over that sparse subset</cite>, rather than approximating attention or relying on fixed windows. <cite index="2-11,2-12">The research model supports 12 million tokens of context, while the production API exposes a 1 million token window; the company says SSA scales linearly with context length, cutting attention compute by roughly 1,000× at 12M tokens.</cite>
<cite index="9-21,9-22">Vendor-reported prefill speedups versus FlashAttention on Nvidia B200 Graphics Processing Units (GPUs) are 7.2× at 128K tokens, 13.2× at 256K, 23× at 512K, and 52.2× at 1M tokens.</cite>
Products and benchmarks
SubQ is available in private beta as three offerings. <cite index="7-11,7-12,7-13,7-14">An API exposes the full context window to developers and enterprise teams; SubQ Code is a command-line coding agent that loads entire codebases into a single context window for planning, execution, and review in one pass; and SubQ Search is a long-context research tool designed to return results at chatbot-like speed.</cite>
On the three benchmarks the company has released, <cite index="2-13">SubQ scored 95.0% on RULER 128K, 65.9% on MRCR v2 at 1M tokens, and 81.8% on SWE-Bench Verified.</cite> <cite index="4-35,4-36">On RULER 128K, SubQ's 95% is nominally tied with Claude Opus 4.7's 94.8%, but the company says running the evaluation on SubQ cost around $8 versus roughly $2,600 on Opus at the same context length.</cite>
Caveats
The model has not been independently verified. <cite index="6-14,6-15">There are no open weights and no full technical report or peer-reviewed paper at launch, and every performance number associated with SubQ is vendor-reported, run under conditions Subquadratic controlled, and not independently reproduced.</cite> <cite index="2-16">Prior subquadratic architectures including Mamba, RWKV, and DeepSeek Sparse Attention have repeatedly underperformed transformers at frontier scale</cite>, leaving open whether SSA generalizes beyond long-context retrieval and coding tasks. <cite index="4-33,4-34">Broader evaluations across general reasoning, math, multilingual performance, and safety have not been published, and the full model card is listed as forthcoming.</cite> <cite index="4-18,4-19">The company has indicated a 50-million-token context window as a Q4 2026 target.</cite>