Claude usage, explained: where your Pro and Max tokens actually go

Most guides on this topic stop at the limit numbers. The number that actually decides how fast you run out is the one nobody controls directly: how much context your tool keeps live on every turn. This is how the meter works, how to read it, and why the way Claude Code manages context matters more than the plan you are on.

Matthew Diakonov, Written with AI

Published June 16, 20268 min read

Direct answer · verified 2026-06-16

Claude usage is metered in tokens, not in messages. Every plan gets a rolling 5-hour session window that opens on your first prompt, plus weekly caps, and that quota is shared across claude.ai, Claude Desktop, and Claude Code. Pro is the 1x tier; Max 5x and Max 20x get 5x and 20x the session allowance. Check live consumption with the /usage command in Claude Code, or under Settings > Usage on claude.ai. The single biggest multiplier is context size, which is why how your tool handles context decides how fast you burn the meter.

Source: Anthropic, “How do usage and length limits work?”

0-hrRolling session window, opens on your first prompt

0xMax 5x session allowance vs Pro

0xMax 20x session allowance vs Pro

0Claude surfaces drawing from one shared quota

The meter is tokens, and the window is rolling

The mental model that trips people up is thinking in messages. Claude does not count messages. It counts tokens, and it weights them by conversation length, the model you picked, the features in use, and the tool calls the agent makes. A two-line prompt at the end of a long session can cost far more than the same prompt at the start, because the whole live context rides along on every turn.

The session window is rolling, not a fixed clock. It opens with your first prompt and covers the next five hours, then resets with a fresh allowance. Fire your first prompt at 10 AM and the window resets at 3 PM regardless of how many prompts you sent in between. Sitting on top of that are weekly caps that accrue across sessions; Max plans carry two of them, one across all models and one for Sonnet usage specifically. Anthropic publishes relative multipliers (Pro at 1x, Max at 5x and 20x) rather than fixed token counts, and its own best-practices doc confirms the shared-pool behavior across claude.ai, Desktop, and Claude Code.

Where the tokens really go in an agent session

Visible prompts and responses are a small slice of the bill. In an agentic session the larger contributors are the parts you do not type: the CLAUDE.md files loaded at every launch, auto-memory, MCP tool definitions, and codebase search, which is usually the single largest contributor. Every retry re-sends context too. None of this shows up as a message in your transcript, but all of it counts against the 5-hour window.

This is the part the limit numbers cannot tell you. Two people on the same Max 20x plan can have wildly different reset timers because one of them keeps short, focused windows and the other runs a single sprawling session all day. The plan sets the ceiling. The tool sets how fast you walk into it.

Why a no-compact wrapper changes your usage math

Fazm is a native macOS app that runs the real Claude Code agent loop and signs in with your existing Claude Pro or Max account. There is no separate metering: usage hits your existing plan and the same shared 5-hour and weekly pool as everything else. That is the boring, important part. The interesting part is what Fazm does differently with context, because that is what moves the meter.

Raw Claude Code auto-compacts a long session: when the conversation approaches the context limit it silently summarizes earlier turns and drops detail to keep going. That saves tokens, but the summary is lossy and you cannot see what was cut, so a decision you made an hour ago can quietly vanish. Fazm does not do that. The full chat history stays live in context for the lifetime of the window, which means nothing gets silently dropped and also means a long window genuinely costs more per turn. The lever that controls that cost is the one-click fork: when the accumulated context gets heavy, you fork to a fresh window from a clean checkpoint and leave the original untouched. You decide when to reset the token weight, instead of a compaction heuristic deciding for you.

Verify it yourself: the “usage hits your existing plan” behavior and the no-auto-compact design are stated on the Claude Code wrapper page and live in the open-source app at github.com/m13v/fazm. You bring the account; Fazm changes only how context is kept and reset, never who gets billed.

Compact away your history, or keep it and fork

The two strategies cost differently and fail differently. Compaction is cheaper per turn but loses fidelity. Keeping the full history is heavier per turn but lossless, and the fork gives you back the cost control without the data loss.

Two ways to keep a long session under the meter

When the session grows, Claude Code summarizes earlier turns and drops detail to stay under the context limit. Cheaper per turn, but lossy.

A lossy summary replaces real history
Decisions can be silently dropped
You cannot inspect what got cut

Usage-control levers: raw Claude Code vs Fazm

Both draw from the same Claude Pro or Max pool. The difference is the controls you get over how fast you draw it down.

Feature	Raw Claude Code	Fazm
Whose account the usage hits	Your Claude Pro or Max plan	Your Claude Pro or Max plan (same shared pool)
Context when a session runs long	Auto-compacts: summarizes and drops detail	Stays full; you choose when to reset it
Resetting accumulated token weight	Run /compact or /clear and lose state	Fork to a fresh window, keep a clean checkpoint
Seeing what left the context	Hard to inspect the compaction summary	Nothing is cut, so nothing to inspect
Routing usage through a proxy or gateway	Manual environment-variable setup	Custom API endpoint field in settings

If you never run long sessions, auto-compaction rarely fires and the two behave the same. The levers matter most for all-day, multi-hour work.

Practical ways to stretch a Claude session

None of these are Fazm-specific tricks; they follow directly from the meter being tokens. Keep CLAUDE.md lean, since it loads on every launch. Avoid re-running the same broad codebase search when you already had the agent read the files. Pick the smaller model for routine turns and save the largest one for the hard problem. And when a window has accumulated a lot of context you no longer need, start fresh from a checkpoint rather than dragging the whole history into every remaining turn. Check the /usage readout before a big run so you know how much of the window you have left.

For more on when compaction actually helps versus hurts, see controlling Claude Code context compaction.

Run Claude Code on your own plan, without losing context

Talk through how Fazm keeps full history live, forks in one click, and bills against your existing Claude Pro or Max account.

Frequently asked

Frequently asked questions

How do I check my Claude usage right now?

Two places. In Claude Code, run the /usage command in the CLI: it shows your current session percentage, weekly percentage, and any extra-usage balance in one view, plus the reset time. On the web, open claude.ai and go to Settings > Usage, where progress bars show how much of the rolling 5-hour session and the weekly limit you have consumed. Both read from the same account, so the numbers agree.

Does using Claude Code count against my Claude Pro or Max limit?

Yes. Anthropic states that usage of all Claude product surfaces (claude.ai, Claude Desktop, and Claude Code) counts toward the same usage limit. There is one shared pool. If you spend an hour in Claude Code in the morning and then open claude.ai in the afternoon, the afternoon session draws from whatever is left of the same 5-hour window and the same weekly cap.

What actually burns my Claude usage the fastest?

Tokens, not messages. Usage is weighted by conversation length, model choice, the features in use, and tool calls. Context size is the single biggest multiplier: every turn re-sends the live context, so a long conversation costs more per turn than a short one even if your prompts are the same length. Codebase search, large CLAUDE.md files loaded at every launch, MCP tool definitions, and retries all add to it.

How does Fazm bill Claude usage, its own API or my plan?

Your own plan. Fazm runs the real Claude Code agent loop and signs in with your existing Claude Pro or Max account, so usage hits your existing plan and the same shared pool described above. There is no separate Fazm metering and no markup on tokens. If you would rather route through an API key, a corporate proxy, or any Anthropic-compatible gateway, Fazm has a custom API endpoint field for that instead.

Does Fazm's no-auto-compacting design make me hit limits faster?

Honest answer: a window you keep running for a long time costs more per turn, because the full history stays live and gets re-sent each turn instead of being summarized away. That is the deliberate tradeoff: you never silently lose decisions to a compaction summary. The lever that controls the cost is the one-click fork. When the accumulated context gets heavy, you fork to a fresh window from a clean checkpoint, which drops the token weight without losing the thread you care about.

What changed with Claude usage limits in 2026?

On May 6, 2026, Anthropic doubled the 5-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans, and removed the earlier peak-hours reduction so Pro and Max now get the same limit regardless of time of day. The structure (a rolling 5-hour session window plus weekly caps, shared across surfaces) did not change; the per-window allowance went up.

Can I route Claude usage through a proxy or gateway instead of my plan?

Yes. Fazm exposes a custom API endpoint setting so you can point the same agent loop at a corporate proxy, a GitHub Copilot endpoint, or any Anthropic-compatible gateway. In that mode the usage is billed wherever that endpoint bills, not against your Claude Pro or Max session window.

Claude usage, explained: where your Pro and Max tokens actually go

The meter is tokens, and the window is rolling

Where the tokens really go in an agent session

Why a no-compact wrapper changes your usage math

Compact away your history, or keep it and fork

Two ways to keep a long session under the meter

Usage-control levers: raw Claude Code vs Fazm

Practical ways to stretch a Claude session

Run Claude Code on your own plan, without losing context

Frequently asked

Frequently asked questions

Related reading

Auto-compacting silently drops your decisions

The real project cost of session loss and compacting

Controlling Claude Code context compaction

Comments ()

The meter is tokens, and the window is rolling

Where the tokens really go in an agent session

Why a no-compact wrapper changes your usage math

Compact away your history, or keep it and fork

Two ways to keep a long session under the meter

Usage-control levers: raw Claude Code vs Fazm

Practical ways to stretch a Claude session

Run Claude Code on your own plan, without losing context

Frequently asked

Frequently asked questions

Related reading

Auto-compacting silently drops your decisions

The real project cost of session loss and compacting

Controlling Claude Code context compaction

Comments (••)

Comments ()