Google launches Gemini 3.1 Ultra with 2M token context window and native multimodal reasoning

Google's new flagship Large Language Model expands the input context to 2 million tokens and processes text, image, audio, and video jointly without intermediate transcription.

Google has released Gemini 3.1 Ultra, the latest flagship in its Gemini family of Large Language Models (LLMs), pairing an expanded context window with native multimodal reasoning across text, images, audio, and video. Secondary reporting describes <cite index="2-1">a 2M token context window, sandboxed code execution, and multimodal reasoning</cite> as the defining changes in the release, though Google's own developer documentation for the Gemini 3 series currently lists a smaller default window for API users.

Context window and architecture

The headline change is scale. According to a developer-focused analysis published shortly after launch, <cite index="2-13,2-14">the context window expanded to 2M tokens, up from 1M in Gemini 3.0 Ultra, with Google claiming better coherence maintenance across the full window rather than the degradation seen in competing models in the final third of long contexts</cite>. For comparison, Google's published guidance notes that <cite index="8-19">with a 1M token context window, Gemini can understand up to 1,500 pages of text or 30,000 lines of code</cite>, implying the new Ultra tier roughly doubles that capacity.

Google's official Gemini API reference still states that <cite index="9-1">Gemini 3 models support a 1 million token input context window and up to 64k tokens of output</cite>, suggesting the 2M ceiling is initially exposed through specific Ultra endpoints rather than the broader Gemini 3 family. The company has also rolled out related preview models, including <cite index="7-7">Gemini 3.1 Flash TTS Preview, a cost-efficient, expressive, and steerable text to speech model</cite>, alongside updated Deep Research and robotics variants.

Multimodal reasoning

Gemini 3.1 Ultra is positioned as a single model that ingests multiple modalities without converting them to text first. Earlier Gemini generations already accepted mixed inputs, but coverage of the new release notes that <cite index="2-18">Gemini 3.0 could process images, video, and audio, but evaluation of the multimodal capabilities was inconsistent</cite>, framing 3.1 Ultra as a consolidation step. Google's consumer documentation describes the Pro tier as having <cite index="8-5">a deeper understanding across text, files, images and videos, plus major upgrades for coding and studying</cite>, with Ultra subscribers gaining access to a <cite index="8-12">Deep Think mode that provides maximum parallel reasoning</cite> for complex tasks.

Tooling and code execution

A second notable change is execution. Independent analysis reports that <cite index="2-15,2-16,2-17">Gemini 3.1 Ultra can now run Python code in a sandboxed environment natively, without a third-party Code Interpreter plugin; it writes code, executes it, observes the output, and revises, closing the gap with ChatGPT's Code Interpreter that has been a competitive advantage for OpenAI on data analysis tasks</cite>. Google's documentation confirms that <cite index="9-35">Gemini 3 supports Google Search, Grounding with Google Maps, File Search, Code Execution, and URL Context</cite> as built-in tools.

Availability and pricing

Distribution mirrors prior Gemini releases. The model is available through Google AI Studio, the Gemini API, and Vertex AI for enterprise customers, with consumer access via the Gemini app under Google AI subscription tiers. Pricing for the Ultra tier had not been finalized at launch; the cited developer analysis notes that <cite index="2-19">Google has not published final pricing</cite> and that processing a full 2M-token request will carry materially higher per-call costs than prior generations, limiting routine use to high-value workflows such as whole-codebase audits or large document review.

The release sharpens Google's competitive position against rival frontier models from OpenAI and Anthropic, particularly on tasks bounded by context length and on workflows that combine multiple input modalities in a single prompt.

Google launches Gemini 3.1 Ultra with 2M token context window and native multimodal reasoning

Context window and architecture

Multimodal reasoning

Tooling and code execution

Availability and pricing

Cross-references

Sources