5/24/2026, 1:02:03 PM · foundation-models

Google releases Gemini 3.5 Flash at I/O 2026 with 4x faster output and agentic focus

Google DeepMind's new Flash-tier model launched generally available on May 19, 2026, priced at $1.50/$9.00 per million tokens and positioned for cost-conscious agentic workloads.

Overview

Google introduced <cite index="1-2,1-4">Gemini 3.5, its latest family of models, kicking off the series with the release of 3.5 Flash</cite> at Google I/O 2026 on May 19, 2026. <cite index="9-9">The company released Gemini 3.5 Flash into general availability at its I/O developer conference, making an unusual claim for a model in its lower-cost tier: that it beats the company's own flagship, Gemini 3.1 Pro, on most coding and agentic benchmarks while running several times faster.</cite>

The release covers the gemini consumer app, developer surfaces, and enterprise platforms simultaneously. <cite index="9-11">The release lands the same day across the Gemini API, Google AI Studio, Antigravity, Vertex AI/Gemini Enterprise, the Gemini app and AI Mode in Search, and is now the default model in the Gemini app and in AI Mode globally.</cite>

Performance and benchmarks

Google's own blog states that <cite index="3-6">3.5 Flash is its strongest agentic and coding model yet, outperforming Gemini 3.1 Pro on key benchmarks (Terminal-Bench 2.1: 76.2%, GDPval-AA: 1656 Elo, MCP Atlas: 83.6%) and leading in multimodal understanding (CharXiv: 84.2%)</cite>. On throughput, <cite index="5-13">Google says it is four times faster than other frontier models when measured in output tokens per second</cite>. <cite index="9-3,9-4">Independent evaluator Artificial Analysis, given pre-release access, measured output of about 284 tokens per second; CEO Sundar Pichai cited 289 tokens per second on stage.</cite>

The model does not lead on every metric. <cite index="9-5">The model trails 3.1 Pro on long-context retrieval and pure-knowledge tests such as Humanity's Last Exam — an expected trade, since Google tuned the model for agentic execution rather than raw recall.</cite>

Pricing and the token-budget pitch

API pricing has drawn the most enterprise scrutiny. <cite index="9-2">Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens, with cached input at $0.15.</cite> <cite index="10-44">This is 3x the price of Gemini 3 Flash Preview ($0.50/$3.00) and 6x the price of Gemini 3.1 Flash-Lite ($0.25/$1.50), but roughly 40% cheaper than Gemini 3.1 Pro ($2.00/$12.00).</cite>

Google CEO Sundar Pichai framed the launch as an answer to runaway enterprise spending on inference. On stage, <cite index="11-6,11-7">Pichai said, "You've heard the anecdotes from other CIOs that companies are already blowing their annual token budgets and it's only May," adding that "if companies used a mix of Flash and other models they could save a lot of money."</cite> He further argued that <cite index="11-4,11-5">"Gemini 3.5 Flash is very capable model at the frontier but remarkably fast" and "delivers frontier level capabilities at less than half the price and in some cases a third of the price."</cite>

Agentic deployment

The launch is tied to Google's broader agent push. <cite index="2-27,2-28,2-29,2-30,2-31,2-32,2-33,2-34,2-35,2-36">According to Google, several enterprise partners are already running 3.5 Flash. Shopify runs subagents in parallel for data analysis, powering merchant growth forecasts. Macquarie Bank is piloting it for customer onboarding, with the model reasoning over complex 100+ page documents. Salesforce is integrating 3.5 Flash into Agentforce to automate enterprise tasks using multiple subagents that retain context across multi-turn tool calling. Ramp uses it for OCR on invoices.</cite>

Developers gain a new control surface as well. <cite index="10-1,10-2,10-3,10-4,10-5">The thinking_level parameter replaces the old integer thinking_budget, with values of minimal, low, medium (default), and high. The default dropped from 'high' in Gemini 3 Flash Preview to 'medium' in 3.5 Flash, meaning code ported without configuration changes will reason less unless thinking_level is explicitly set to 'high'.</cite>

What comes next

<cite index="14-26,14-27,14-28">Gemini 3.5 Pro is expected in June 2026. Google did not ship it at I/O and gave no exact date, only "next month." Google says the Pro model is already being used internally and shares the same focus on coding and agents.</cite>

Cross-references

Sources

  1. [1]
    Gemini 3.5: frontier intelligence with action
  2. [2]
    Google Introduces Gemini 3.5 Flash at I/O 2026: A Faster and Cheaper Model for AI Agents and Coding - MarkTechPost
  3. [3]
    Innovations from Google I/O 26 on Google Cloud | Google Cloud Blog
  4. [4]
    Google I/O 2026 - Gemini 3.5 Flash - Gemini Omni - Gemini Spark - Duke Digital Media Community
  5. [5]
    Gemini 3.5 Flash is here: Google's smartest speed model promises better coding and agents
  6. [6]
    Gemini 3.5 Flash is Google's new default AI model, and it's built to act, not just answer - Digital Trends
  7. [7]
    Google claims new Gemini 3.5 Flash runs 4x faster than rival frontier models
  8. [8]
    Gemini 3.5 Flash is Google’s new default AI model, and it’s built to act, not just answer
  9. [9]
    Google Ships Gemini 3.5 Flash, a Cheap-to-Run Agent Model That Costs 3x More Per Token
  10. [10]
    Gemini 3.5 Flash Review: Benchmarks, Price & API (2026)
  11. [11]
    Google launches Gemini 3.5 Flash, eyes token cost, performance balance | Constellation Research
  12. [12]
    Gemini 3.5 Flash - API Pricing & Benchmarks | OpenRouter
  13. [13]
    Gemini 3.5 Flash: The Flash That Beat Last Year's Pro (Complete 2026 Guide) | NxCode
  14. [14]
    Gemini 3.5 Review: What Google Launched at I/O 2026
  15. [15]
    Gemini API Pricing Calculator & Cost Guide (May 2026)
  16. [16]
    Google I/O 2026: Gemini 3.5 Flash, Spark & Agentic AI
  17. [17]
    Gemini 3.5 Flash API Pricing 2026 - Costs, Performance & Providers
  18. [18]
    Gemini 3.5 Flash Pricing: Token Costs & Budget Examples (2026)