Recent AI model releases and developments, April to May 2026

Four frontier drops in roughly six weeks. Here is the dated record, and the development under it that the roundups skip: when the best model changes this fast, the model becomes the cheap part to swap, and the harness around it is the thing that compounds.

M
Matthew Diakonov
9 min read
Direct answer (verified 2026-05-25)

The headline April to May 2026 releases: Claude Opus 4.7 went GA on April 16, GPT-5.5 shipped April 23 (six weeks after GPT-5.4), and Google's Gemini 3.1 line filled in around them (Ultra with a 2M-token context, Flash Lite around May 8). By May 2026, LLM Stats was tracking 500+ models. The pattern that matters: the frontier moved about every six weeks, which makes the model the part you should be able to swap cheaply.

WhenWhoWhatNote
Apr 16, 2026AnthropicClaude Opus 4.7 general availabilityCoding-focused gains over Opus 4.6 on the hardest software tasks; powers Claude Code.
Apr 23, 2026OpenAIGPT-5.5Six weeks after GPT-5.4; reported ~60% drop in hallucinations vs 5.4; free in ChatGPT, API at $5/$30 per 1M tokens.
Apr to May 2026GoogleGemini 3.1 Ultra2M-token context, natively multimodal across text, image, audio, video.
~May 8, 2026GoogleGemini 3.1 Flash LiteLightweight efficiency variant below 3.1 Flash, optimized for speed and cost per call.

Sources: anthropic.com/news/claude-opus-4-7, llm-stats.com/llm-updates and llm-stats.com/ai-news (cross-vendor tracker), accessed 2026-05-25.

The development the roundups under-cover

Every other guide on this will give you the same four releases, usually with a benchmark table comparing Opus 4.7, GPT-5.5, and Gemini 3.1. That comparison is real but it is also the least durable part of the story, because the numbers are stale within weeks. GPT-5.5 shipped six weeks after GPT-5.4. Opus 4.7 followed 4.6 on a similar clock. If you organize your work around a single model's exact strengths, you are re-organizing every six weeks.

The development worth recording from this window is structural, not model-specific: the gap between the top few frontier models on everyday coding work is now smaller than the gap between a workflow that keeps its state and one that throws it away on every restart, compaction, or tool switch. The model is the cheap part. The expensive part, the part that compounds, is the harness that holds your sessions, your full context, your tools, and your reach across the rest of your machine.

~6 wk

The frontier shipped on roughly a six-week clock in this window. Anything anchored to one model's exact strengths decays on the same clock. The harness does not.

April to May 2026 release cadence

Anchor the workflow, swap the model

This is the concrete version of the argument, worked through Fazm because it is the tool I build and use. Fazm wraps Claude Code (via the claude-agent-acp adapter) and Codex (codex-acp) through the Agent Client Protocol in a native macOS app. As of v2.9.35 and v2.9.36, tagged May 22, 2026, it also exposes Google Gemini Flash and Gemini Pro as selectable ACP backends, including as a free option when built-in credits run out. The backend is chosen per chat:

Fazm: per-chat model backend

The anchor fact is the last line of that picker. When you switch a window from Opus 4.7 to Gemini Pro, the session does not reset. Chats are persistent and survive a Mac restart with every window auto-restored, there is no auto-compacting so the full history stays live in context for the window's lifetime, and one-click forking opens a new window with the full prior context while leaving the original untouched. So trying GPT-5.5 against Opus 4.7 on the exact same task is a fork plus a dropdown change, not a copy-paste migration. The April to May 2026 model churn turns into a series of cheap experiments instead of a series of migrations.

What each model release costs you, by where your workflow lives

The same release event lands very differently depending on whether your work is anchored to a harness or to a model. The releases are identical; the cost is not.

FeatureWorkflow anchored to a modelWorkflow anchored to a harness (Fazm)
A new frontier model ships (every ~6 weeks in this window)Re-learn a new CLI/app, re-import settings, re-promptChange one dropdown in the AI picker; keep the session
Long task spanning a Mac restartSession lost; start the task overWindow auto-restores with full conversation intact
Context fills up mid-taskSilent compaction drops earlier decisionsNo auto-compacting; full history stays live for the window
Want to try the same prompt on two modelsCopy-paste context by hand into a second toolOne-click fork to a new window, swap backend, original untouched
Reach beyond the terminal (browser, native apps, Workspace)Tied to whatever surface the model vendor shipsSame agent loop via accessibility APIs, model-agnostic

The honest counterargument

The case against this view is that model quality still dominates outcomes, so you should always be on the best model and the switching cost is worth paying. That is partly right. On genuinely hard, frontier-pushing tasks the difference between Opus 4.7 and a model two releases back is real, and you should reach for the best one. But notice that this argument actually strengthens the harness point: if you want to always be on the best model, you want the swap to be free. A harness that lets you change backends per chat without losing state is exactly what makes always-on-the-best-model affordable. The two positions are not in tension. The harness is what lets you act on the model-quality argument without paying a migration tax every six weeks.

The second objection is that a wrapper adds a layer that could lag the underlying CLI. Fazm's answer is that it runs the real agent loop through ACP rather than reimplementing it, so the Claude Code path is the actual Claude Code, and new model backends are added as ACP backends (as Gemini was in v2.9.35). The layer it adds is UI and state management, which is precisely the layer the bare CLI leaves to you.

A reasonable way to track the next six weeks

  • Skim the lab blogs weekly so you know what exists. Frontier drops are loud; you will not miss Opus 4.8 or GPT-5.6 by checking once a week.
  • Keep a cross-vendor tracker bookmarked for dates and rough benchmark deltas rather than trusting recall. The numbers move; the tracker does not.
  • Make trying a new model a one-dropdown experiment, not a migration. If swapping backends costs you a session, your tool is the bottleneck, not the model.
  • Judge models on your own recurring tasks, forked side by side, not on someone else's benchmark suite. The fork-plus-swap pattern makes this a five-minute test.

Want to see the per-chat model swap on your own tasks?

If you are evaluating an agent setup against the April to May 2026 model churn, book 15 minutes. We will open Fazm, fork a real task, and run it on two backends side by side so you can see what swapping costs (nothing) versus what migrating a workflow costs.

Frequently asked questions

What were the major AI model releases in April and May 2026?

Four are well documented. Anthropic made Claude Opus 4.7 generally available on April 16, 2026, with its gains concentrated on the hardest software-engineering tasks. OpenAI shipped GPT-5.5 on April 23, 2026, only about six weeks after GPT-5.4, with a reported roughly 60% reduction in hallucinations versus 5.4 and pricing of $5 input / $30 output per 1M tokens via API. Google's Gemini 3.1 line was the other story: Gemini 3.1 Ultra carries a 2-million-token context window and is natively multimodal across text, image, audio, and video, and Gemini 3.1 Flash Lite reached gateways around May 8 as the lightweight efficiency variant of the line. LLM Stats was tracking more than 500 models across commercial APIs and open-source releases by May 2026.

What is the single most important pattern across the April to May 2026 releases?

Cadence. The frontier moved roughly every six weeks (Opus 4.6 to 4.7, GPT-5.4 to 5.5 in six weeks, the Gemini 3.1 family filling in around it). When the best available model changes that fast, anything you build tightly around one specific model decays in weeks. The durable layer is the harness that holds your sessions, your context, your tools, and your reach across apps. The model becomes the cheap, swappable part. That is the development that the release roundups under-cover because they are organized by model, not by what survives a model swap.

How does Fazm let me swap between these models without losing work?

Fazm wraps Claude Code (via the claude-agent-acp adapter) and Codex (codex-acp) through the Agent Client Protocol, and as of v2.9.35 and v2.9.36 (tagged May 22, 2026) it also exposes Google Gemini Flash and Gemini Pro as selectable ACP backends, including as a free option when built-in credits run out. The backend is chosen per chat in the AI picker, so a single window can run on Opus 4.7 today and Gemini Pro tomorrow. Because sessions are persistent and there is no auto-compacting, swapping the backend does not cost you the conversation history. You change a dropdown, not your workflow.

Why does session persistence matter more when models ship every six weeks?

Because the cost of churn is paid in your state, not the model's weights. Every new model is a reason to re-run a task, compare outputs, or migrate a workflow. If your tool loses sessions on restart, compacts context silently, or makes forking a multi-step session-id dance, each model swap is expensive. Fazm removes those three costs: chats survive a Mac restart and every window auto-restores with full history, the full chat history stays live in context for the window's lifetime with no auto-compacting, and every chat has a one-click fork that opens a new window with the full prior context while leaving the original untouched. The faster the model cadence, the more those three properties pay off.

Do I need separate API keys for each model in Fazm?

Not necessarily. Fazm runs on your existing Claude Pro or Max account for the Claude Code path, so usage hits the plan you already pay for rather than a separate metered API bill. Gemini is available as a backend including a free option when built-in credits run out. There is also custom API endpoint support, so you can route through a corporate proxy, GitHub Copilot, or any Anthropic-compatible gateway. The point is that the account and routing are configuration, not a rewrite of how you work.

Where do I check primary sources for the April to May 2026 releases?

Anthropic's Claude Opus 4.7 announcement is at anthropic.com/news/claude-opus-4-7. A continuously updated cross-vendor tracker with dates and benchmark numbers is at llm-stats.com/llm-updates and llm-stats.com/ai-news. For the Fazm-side facts, the tagged releases with ISO dates live in CHANGELOG.json at the root of github.com/mediar-ai/fazm, and the source for the ACP backend wiring is in the same repository. Treat the vendor pages as authoritative for dates and the tracker as the convenient cross-vendor index.

Is chasing every new model release actually worth it?

For most working developers, no, not in the sense of migrating tools each time. The honest reading of April to May 2026 is that the gap between the top few frontier models on everyday coding work is smaller than the gap between a workflow that loses state and one that does not. A reasonable cadence is: skim the lab blogs weekly so you know what exists, and let your harness make trying a new model a one-dropdown experiment rather than a migration. That is the opposite of the treadmill: you get to sample the frontier without rebuilding your setup around each release.

Does Fazm work beyond coding, or only as a Claude Code GUI?

It reaches beyond the terminal. The same agent loop that runs Claude Code or Codex also drives your actual browser via the extension, native Mac apps via macOS accessibility APIs (not screenshots), and Google Workspace apps like Docs, Sheets, and Calendar. It supports MCP servers so your existing tools come along without a rewrite, and it is voice-first: hold a hotkey and talk to the same agent. It is fully open source and runs locally on macOS 14 or newer.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.