JUNE 11-12 2026 / OPEN WEIGHTS / RUN IT, DO NOT JUST READ IT

One open model dropped on June 12. The question is how you run it.

Every roundup of these two days will hand you a model card and a benchmark table. None of them tell you the next step: how to put a day-old open model behind the persistent agent loop you already use on your Mac. That step is one setting, and it is the whole reason this page exists.

Matthew Diakonov, Written with AI

Published June 16, 20269 min read

Direct answer, verified 2026-06-16

The standout open release across June 11 and 12, 2026 was Moonshot AI's Kimi K2.7-Code, published June 12 with open weights on Hugging Face at moonshotai/Kimi-K2.7-Code. Its model card describes a 1-trillion-parameter Mixture-of-Experts design with roughly 32B active parameters and a 256K context window. Beyond that single dated release, no platform publishes a list keyed to a calendar day. The durable feeds for any 48-hour window are:

huggingface.co/models sorted by trending, for weights and quantized variants.
huggingface.co/papers for research with a linked implementation.
github.com/trending for agents, MCP servers, and inference engines.

The benchmark numbers Moonshot publishes are self-reported until an independent harness re-runs them. So the useful move on release day is not to read the table; it is to wire the model into your own agent loop and try it. The rest of this page is exactly how.

What Kimi K2.7-Code is, in the numbers that matter

Moonshot AI shipped K2.7-Code on June 12, 2026 as a coding-first model built on the K2.6 architecture, with open weights and a permissive license on its model card. The figures below come from that card and Moonshot's release notes. Read the last one as a vendor claim, not a settled fact: a +21.8% jump on a benchmark the vendor designed and runs itself is a reason to test, not a reason to believe.

total parameters, Mixture-of-Experts (about 32B active per token)

context window, per the published model card

+0%

Moonshot-reported gain on Kimi Code Bench v2 over K2.6

Open weights at that scale are not something most people will run locally; the full model is hundreds of gigabytes. In practice you will reach it through a hosted provider or a gateway. Which is exactly where the next part comes in: the gateway is also how it reaches your agent loop.

The gap every roundup leaves open

Search this topic and you get model cards, spec tables, and pricing comparisons. All of it ends at the same place: the model exists, here are its numbers. The thing nobody answers is the practical one. You use an AI coding agent every day, with sessions you do not want to lose, a forking habit, and a long context you do not want silently truncated. A new open model drops. How do you actually try it inside that setup, today, without throwing away your session machinery and learning a new app?

That gap is the spine of this page. The answer on macOS is a single override, and it is small enough to be unglamorous, which is probably why no roundup bothers with it.

The one setting: Fazm overrides ANTHROPIC_BASE_URL, nothing else

Fazm is a native macOS app that wraps Claude Code, Codex, and Gemini as agent-loop backends through the Agent Client Protocol. Its Custom API Endpoint setting does exactly one thing, and the source is blunt about it: it overrides ANTHROPIC_BASE_URL. When the endpoint is on, Claude-compatible requests route to your gateway instead of Fazm's built-in credits. Put a server that speaks the Anthropic Messages API in front of an open model, and the same persistent Mac UI now drives that model.

What the endpoint does, and does not, touch

The sharp edge is in the third and fourth lines above. Because the override only affects the Claude path, the model you select in the picker has to be a Claude id for traffic to reach your endpoint. Pick Codex or Gemini while the endpoint is set and your requests quietly bypass it. Fazm guards against that silent failure with a warning that lives in Desktop/Sources/MainWindow/Pages/SettingsPage.swift: "The custom endpoint only applies to Claude models. Switch to a Claude model for your requests to reach it." That guard was added in a June 15, 2026 commit titled "Add warning for non-Claude models in custom endpoint settings."

Wiring a day-old open model into the loop, in four steps

This is the whole procedure. The first step is the only one that takes real work, and you do it once; after that, every future release is the same four steps with a different model loaded behind the gateway.

From release to running

1
Serve the open model behind a Claude-compatible gateway
Stand up LiteLLM, a corporate proxy, or a local server (LM Studio, Ollama) that exposes an Anthropic Messages API in front of the new weights. This is the only hard requirement: the gateway must speak Claude's wire format.
2
Set the Custom API Endpoint in Fazm
Settings, Advanced, AI Chat, Custom API Endpoint. Fazm overrides ANTHROPIC_BASE_URL to your gateway URL. Claude-compatible requests now route there without using Fazm's built-in credits.
3
Keep a Claude model id selected
The endpoint only applies to the Claude path. If the picker shows a Claude model, requests reach your gateway. Pick Codex or Gemini instead and Fazm warns you that the endpoint is bypassed.
4
Use the same persistent session you already had
Sessions survive a restart, forking is one click, and nothing auto-compacts. Only the model behind the gateway changed. You are testing a day-old release inside the loop you trust.

What the endpoint is allowed to proxy to

The in-source comments name the supported shapes directly. One comment in ACPBridge.swift describes the endpoint as allowing proxying through Copilot and corporate gateways. The error handling a few lines down names LM Studio and Ollama as local servers, including a message that tells you to load a model in your local server when the endpoint reports nothing loaded. The settings search entry for the feature carries the keywords endpoint, proxy, base url, anthropic, copilot, gateway, and corporate.

So three concrete routes exist for a new open model: a hosted Claude-compatible gateway (a service that re-exposes the model behind the Anthropic wire format), a corporate proxy your team already runs, or a local server on your own machine. In every case Fazm does not load the weights; it sends the agent loop to wherever your endpoint points.

Why this is worth doing instead of opening another app

The reason to route a new model through the endpoint rather than downloading a separate chat client is that the model is the cheap part to swap. What is expensive to give up is the session layer sitting above it. In Fazm, that layer is the same regardless of which model answers: chats survive a Mac restart and every window auto-restores, forking a conversation is one click into a new window that carries the full prior context while leaving the original untouched, and the history stays live with no auto-compacting for the lifetime of the window.

Try a day-old release the usual way and you start a brand-new conversation in a brand-new app, with none of that. Try it through the endpoint and you are evaluating the model under the exact working conditions you care about, with a real project loaded and a real session you can fork to compare against your current model. That is a fairer test, and it is faster.

How to verify every claim on this page

For the model: open moonshotai/Kimi-K2.7-Code and read the card for the release date, the parameter and context figures, and the license. The commit history on the repo carries the June 12, 2026 timestamps.
For the endpoint behavior: open github.com/mediar-ai/fazm, read Desktop/Sources/MainWindow/Pages/SettingsPage.swift for the ANTHROPIC_BASE_URL note and the non-Claude warning string, and Desktop/Sources/Chat/ACPBridge.swift for the Copilot, corporate, LM Studio, and Ollama references.
For the broader picture: open huggingface.co/papers and github.com/trending. The trending order is volatile, but the submission dates are stable.

Want to see a new open model wired into a persistent session live?

Fifteen minutes on a call. I will set a custom endpoint, point it at a Claude-compatible gateway, and fork the same chat to compare a fresh model against the one I run every day.

Frequently asked questions

What new AI actually released on June 11-12, 2026?

The standout open release was Moonshot AI's Kimi K2.7-Code, published June 12, 2026 with open weights on Hugging Face at moonshotai/Kimi-K2.7-Code. Its model card describes a 1-trillion-parameter Mixture-of-Experts design with roughly 32 billion active parameters per token and a 256K context window, and Moonshot reports a +21.8% gain on its own Kimi Code Bench v2 over K2.6. Beyond that single dated release, neither Hugging Face nor GitHub publishes a list keyed to a calendar day. Both order discovery by a rolling trending score that has no notion of dates, so the durable feeds for any 48-hour window are huggingface.co/models sorted by trending, huggingface.co/papers, and github.com/trending.

Why does searching a specific date rarely return a clean list?

A trending score is built to surface popular things, not new ones. The same model can hold the top slot for a week, and a paper that arrived three days ago can outrank a release that landed this morning. So a literal calendar-date question is a poor fit for a homepage feed. The instruments that actually carry trustworthy timestamps are model card commit histories, arXiv submission dates, and project changelogs. A vendor's own benchmark numbers (like the +21.8% Kimi reports) are self-reported until an independent harness re-runs them, so treat a fresh release as a thing to test, not a leaderboard fact.

Can Fazm run a brand-new open model like Kimi K2.7-Code?

Indirectly, and only through a Claude-compatible gateway. Fazm wraps Claude Code, Codex, and Gemini as agent-loop backends through the Agent Client Protocol. Its Custom API Endpoint setting overrides exactly one thing: ANTHROPIC_BASE_URL. If you run a gateway that speaks the Anthropic Messages API in front of an open model (LiteLLM, a corporate proxy, GitHub Copilot's endpoint, LM Studio, or Ollama with an Anthropic-compatible shim), you point Fazm's endpoint at it, keep a Claude model id selected, and the same persistent Mac UI drives whatever model sits behind the gateway. Fazm does not load GGUF weights itself; it routes the agent loop to your server.

Why does Fazm warn me when I pick a non-Claude model with a custom endpoint?

Because the custom endpoint only overrides ANTHROPIC_BASE_URL, and that variable is only read on the Claude path. If you select Codex or Gemini in the picker while a custom endpoint is set, your requests bypass the endpoint entirely and hit those providers directly. To stop people silently leaking traffic past their own proxy, the settings screen shows a guard when the selected model is not a Claude id: "Your current model does not use this endpoint. The custom endpoint only applies to Claude models. Switch to a Claude model for your requests to reach it." That string lives in Desktop/Sources/MainWindow/Pages/SettingsPage.swift and was added in a June 15, 2026 commit.

What does the custom endpoint actually let me proxy to?

The in-source comments are explicit: the endpoint exists to allow proxying through GitHub Copilot, corporate gateways, and local servers. ACPBridge.swift error handling names LM Studio and Ollama directly, including a message that tells you to load a model in your local server (for example LM Studio, Developer, Load Model) when the endpoint reports no model loaded. The settings search entry for the feature carries the keywords endpoint, proxy, base url, anthropic, copilot, gateway, and corporate. So the supported shapes are: a hosted Claude-compatible gateway, a corporate proxy, or a local server on your own machine.

If I route to my own model, does Fazm still keep my context persistent?

Yes. The backend behind the endpoint changes which model answers; it does not change the UI layer. Persistent sessions that survive a Mac restart, one-click chat forking into a new window with the full prior context, and the absence of auto-compacting all sit above the agent loop. That is the whole point of wiring a new model in through the endpoint instead of opening a separate app: you inherit the session machinery you already use, and only the weights underneath change.

Was there a frontier closed-model release on these exact two days?

Not a verifiable closed-weight foundation launch keyed specifically to June 11 or June 12, 2026. The frontier Claude model in this period is Claude Opus 4.8, but pinning a precise calendar date to a closed release from a search feed is exactly the trap this page warns about. The one release with a hard, checkable June 12 date and public weights is Kimi K2.7-Code. If you want the broader landscape, the live feeds are the honest source, and they change hour to hour.

How do I verify the Fazm behavior described here?

Open github.com/mediar-ai/fazm and read two files. Desktop/Sources/MainWindow/Pages/SettingsPage.swift contains the ANTHROPIC_BASE_URL override note, the isClaudeModelId guard, and the non-Claude warning string. Desktop/Sources/Chat/ACPBridge.swift contains the comment that the custom endpoint allows proxying through Copilot and corporate gateways, plus the LM Studio and Ollama error messages. The git log shows the June 15, 2026 commit titled "Add warning for non-Claude models in custom endpoint settings." Everything is public.

Where do I track model and paper releases day to day?

Three feeds plus one habit. huggingface.co/models sorted by trending for weights and quantized variants. huggingface.co/papers for research where an implementation is linked. github.com/trending for the application layer of agents, MCP servers, and inference engines. The habit is to keep one open model wired into your everyday agent loop through a gateway, so the day a release like Kimi K2.7-Code lands you can point your endpoint at it and try it inside the same persistent session, instead of reading a benchmark table and moving on.

Adjacent reading