Subquadratic Raises $29M to Launch SubQ — The World's First 12-Million-Token AI Context Window That Beats GPT-5.5
Published: May 13, 2026

Breaking the Quadratic Barrier
The AI industry has long been shackled by a fundamental architectural constraint: the larger the context window, the more exponentially expensive it becomes to run. Doubling the input doesn't just double the compute — it quadruples it. That's the infamous quadratic scaling problem baked into every transformer model since 2017. Now, a Miami-based startup called Subquadratic is claiming it has cracked the code — and it has $29 million in fresh seed funding to prove it.
The company officially launched in May 2026 with its flagship model, SubQ, built on a proprietary architecture called Subquadratic Selective Attention (SSA) — a fully sparse-attention system that scales linearly in both compute and memory as context length grows.
What Makes SubQ Different?
Traditional transformer models use dense attention — every single token in a prompt is compared against every other token. With 1,000 tokens, that's 1,000² = 1,000,000 comparisons. Scale that to millions of tokens and the cost becomes astronomical.
SubQ's SSA architecture takes a fundamentally different approach: instead of comparing every token to every other token, it uses content-dependent selection to identify which token relationships actually matter — and only processes those. The selection mechanism itself doesn't go quadratic, which is the key differentiator from prior attempts like DeepSeek's Native Sparse Attention, where the indexer that picks which tokens to attend to was still quadratic under the hood.
As CTO Alexander Whedon explained:
"For prompt A, words one and six are going to be important to each other. For prompt B, maybe it's words two and three. It's different for every prompt."
This means the model routes exact attention only to meaningful token pairs, dynamically and efficiently — at any context length.
The Numbers Are Staggering
The performance claims Subquadratic is putting forward are hard to ignore:
- 12 million token context window — roughly 9 million words or ~120 books in a single prompt
- 52× faster than dense attention (FlashAttention) at 1 million tokens
- 92.1% accuracy on needle-in-a-haystack retrieval at 12M tokens — a length no other frontier model even approaches
- 83% on MRCR v2, beating OpenAI's GPT-5.5 (74%) by 9 full points
- 82.4% on SWE-bench, surpassing Anthropic's Claude Opus 4.6 (81.42%) and Google's Gemini 3.1 Pro (80.6%)
- On RULER 128K, SubQ scored 95% accuracy at just $8, compared to Claude Opus's $2,600 for the same benchmark — a ~300× cost reduction
- At full 12M-token context, compute requirements drop by nearly 1,000× versus other frontier models
Who's Backing Subquadratic?
The $29M seed round attracted a notable roster of investors, including:
- Javier Villamizar, former partner at SoftBank Vision Fund
- Justin Mateen, co-founder of Tinder and founder of JAM Fund
- Early backers of Anthropic, OpenAI, Stripe, and Brex
The company was co-founded by Justin Dangel (CEO) and Alexander Whedon (CTO), and has assembled a team of 11 Ph.D. researchers to drive its architectural research.
What Subquadratic Is Building
The funding will be deployed across three major product tracks:
- SubQ API — A developer API offering full access to the 12-million-token context window for enterprise and developer use cases
- SubQ Code — A CLI-based coding agent designed to load entire codebases into a single context window, eliminating the need for multi-agent coordination
- SubQ Search — A deep research tool, initially free, targeting long-context research and enterprise workloads
The model will not be open-source in the near term but will support customer-specific fine-tuning.
The Bigger Picture
The implications go far beyond raw benchmark scores. Today's developers are forced to build elaborate workarounds — RAG pipelines, agentic retrieval systems, chunking strategies — just to manage the limitations of context windows. These systems add latency, cost, and fragility.
If SubQ's architecture holds up at scale, it could make those workarounds obsolete. Entire codebases, legal document archives, research corpora, and enterprise knowledge bases could be processed in a single pass — without the brittle data curation that currently bottlenecks AI product quality.
Subquadratic has also teased a 50-million-token context window model on the roadmap — a figure that would make today's frontier models look like pocket calculators.
As CEO Justin Dangel put it:
"The fundamental scaling laws imposed by the transformer architecture and dense attention have been broken through."