techFeatured

Why Claude Keeps Going Down, and What It Actually Means

Anthropic went from $1B to roughly $30B in annual run-rate revenue in 15 months, and Claude now breaks most mornings between 9 AM and noon Pacific. The outages aren't random. They're concentrated exactly where Claude won: long-running coding agents, on the messiest multi-cloud stack in the industry, metered by a rate limiter that sometimes drains a weekly quota in 19 minutes.

Tech Talk News Editorial17 min read
ShareXLinkedInRedditEmail
Why Claude Keeps Going Down, and What It Actually Means

On April 15, 2026, at 14:53 UTC, Claude.ai, the Anthropic API, and Claude Code started returning login errors and 500s in roughly the same minute. The outage lasted about two hours and 50 minutes and hit all three surfaces at once, per the Anthropic status history feed. Translated into working hours, that was 7:53 AM Pacific to 10:42 AM Pacific. Peak deploy window, peak developer window, peak coffee.

Two weeks earlier, between March 31 and April 1, Opus 4.6 and Sonnet 4.6 ran in elevated-timeout mode for roughly 12 hours. A few days before that, a chunk of Claude Code Max subscribers watched a five-hour quota drain in 19 minutes because the meter itself was broken, per The Register. In the first two weeks of April alone, the status page logged more than 25 separate incidents. March was worse.

So Claude is down a lot right now. That part is not a perception problem. What nobody is saying clearly is why it's down, when it's down, and whether the capacity deals Anthropic has been stacking (an $8 billion AWS Trainium pact, a reported 1 million Google TPUs ramping to 3.5 gigawatts through 2027, another $30 billion committed to Azure) actually fix the problem they look like they fix.

The way I think about this is through two ideas I'll come back to through the piece. The first is the Agent Tax, the structural reason long-running coding agents are the hardest workload to serve and the one Anthropic is most exposed to. The second is the Fail-Whale Era, the 12 to 36 month reliability rough patch every successful consumer platform has paid on its way up, from Twitter in 2008 to ChatGPT in 2023. Claude is roughly month six of theirs. Both frames are useful. Neither is comfortable.

When Claude Goes Down, It's 10 AM Pacific on a Tuesday

Look at the status feed for long enough and the pattern is hard to miss. Claude's worst incidents cluster during North American working hours, usually starting between 14:00 and 18:00 UTC, which is 7 AM to 11 AM Pacific, 10 AM to 2 PM Eastern. The April 15 outage started at 14:53 UTC. The April 13 login outage ran 15:31 to 16:19 UTC. The March 31 timeout event started at 17:45 UTC and ran into the next morning. This is the window where US developers open their laptops, every cron-scheduled CI pipeline in North America fires, and every Claude Code session on a Jira ticket kicks off in parallel.

The independent aggregator IsDown confirms what the feed implies. Over rolling 30-day windows in early 2026, Claude's headline availability has sat around 89 to 92 percent, which is a staggering number for a platform customers pay $200 a month to use for production work. For comparison, the OpenAI status history shows roughly 130 to 160 incidents in the same window, fewer than Claude in raw count and tilted toward surface issues (login, FedRAMP workspaces, Atlas) rather than multi-hour core-inference timeouts. Google's Vertex AI shows far fewer incidents, maybe 15 to 25 over six months. xAI's Grok is worse than Claude. Meta AI has no public status page, which is a separate problem.

ProviderIncidents Oct 2025 to mid-April 2026Worst event in windowSeverity shape
Anthropic / Claude~180 to 200March 31 to April 1 2026, ~12h Opus and Sonnet timeoutsFrequent, often core-inference, often during US business hours
OpenAI~130 to 160Nov 2025 ChatGPT global, Dec 2025 Responses APIFewer line items, bigger blast-radius when they happen
Google Vertex / Gemini~15 to 25June 12, 2025 multi-product GCP outageLowest count, biggest blast when it breaks
xAI / Grok~30 to 50Jan 27, 2026 day-long multi-surfaceStill pre-enterprise-grade

Normalize for how granularly each company reports and the gap narrows. But it doesn't vanish. Claude is measurably worse than OpenAI on core-inference availability right now, and it's worse during the hours that matter most to the exact users it's trying to win.

Claude's failures land on developers. A ChatGPT hiccup costs a consumer a chat turn. A Claude Code hiccup costs an engineer an afternoon.

The Three Bugs That Explain a Lot of It

On September 17, 2025, Anthropic published a postmortem of three recent issues. It is the most useful document anyone has put out on this topic, because the lab names the bugs with unusual specificity. Three unrelated failures stacked during August and September 2025, each one a distinct category of problem. Read them together and you can see the shape of the reliability surface.

  1. Sticky misrouting. Between August 5 and September 18, a fraction of short-context Sonnet 4 requests got routed to servers configured for the upcoming one-million-token context variant. A load-balancer change on August 29 pushed the affected share from 0.8 percent of traffic to 16 percent at peak. Sessions were sticky, so once you landed on the wrong pool, the whole conversation stayed there. About 30 percent of Claude Code users hit at least one degraded response during the window. The bug hit the first-party API plus Amazon Bedrock plus Google Vertex simultaneously, because it lived at the routing layer.
  2. TPU output corruption. Between August 25 and September 2, a runtime performance change on Google TPU servers misconfigured token generation. The model started emitting unrelated Thai characters and Chinese syntax mid-response. This hit Opus 4.1, Opus 4, and Sonnet 4.
  3. XLA approximate top-k miscompilation. The nerdy one. Models compute sampling in bfloat16, but TPU vector processors optimize intermediates to float32, and the two disagree about which token has the highest probability. A patch on August 26 removed a precision workaround and exposed a latent bug in the XLA compiler's approximate_top_k op, which occasionally dropped the actual argmax token. Anthropic fixed it by switching to exact top-k and eating the performance cost.

Under those three bugs is the structural line Anthropic buries inside the postmortem: "Claude is deployed across multiple hardware platforms, namely AWS Trainium, Nvidia GPUs, and Google TPUs." That is the smoking gun. Running the same model weights across three silicon vendors means three compilers, three numerics stacks, three separate canary pipelines, and three places a weekend deploy can go sideways. A TPU-only bug is invisible in the Trainium canary. The heterogeneity that makes Anthropic's capacity story possible is the same thing that multiplies the reliability surface.

Call that the Silicon Heterogeneity Tax if you want a third name for it. It's the structural reason Claude eats bugs that OpenAI doesn't. OpenAI runs mostly on Nvidia through Azure. Anthropic runs on three stacks that don't agree on what floating-point numbers mean.

The October 2025 global blackout that took down half the internet was a different beast: a DynamoDB DNS race condition in AWS's us-east-1 that cascaded through every tenant that lived there, per the AWS postmortem series. The March 2026 Claude outage was an AWS Middle East datacenter incident. Those are boring single-region SPOFs, not inference saturation. The point is that a lot of what users chalk up to "Claude is out of GPUs" is actually deploys, compilers, DNS, and regional dependencies. That matters for what does and doesn't get fixed by the capacity deals.

Why Demand Outran Supply in Fifteen Months

Now the other half. Claude's reliability wouldn't be this visible if the traffic wasn't this absurd.

Anthropic's annual run-rate revenue went from about $1 billion in December 2024 to roughly $30 billion by April 2026. Checkpoints along the way, sourced where possible to primary reporting: $4 billion in July 2025 per Epoch AI's revenue tracker, $9 billion by end of 2025, $14 billion in February 2026 per SaaStr, and ~$30 billion by April 2026 per Bloomberg. That is roughly 30x in 15 months. There is no obvious historical analog in enterprise software. Slack, Snowflake, and ServiceNow didn't compound at these speeds. OpenAI didn't compound at these speeds.

The market share data is just as striking. Menlo Ventures' year-end 2025 State of Generative AI in the Enterprise put Anthropic at 40 percent of enterprise LLM API spend against OpenAI's 27 percent and Google's 21 percent. Inside coding specifically, Anthropic was 54 percent, OpenAI 21 percent. The April 2026 Ramp AI Index has Anthropic capturing 73 percent of first-time enterprise AI spend. That last number is the signal. It's not that enterprises are switching off OpenAI. It's that every new AI budget that opens in 2026 is going to Claude first.

DateAnthropic ARRPrimary source
Dec 2024~$1BEpoch AI tracker
Jul 2025~$4BEpoch AI
Dec 2025~$9BAnthropic investor update (cited by SaaStr)
Feb 2026~$14BSaaStr / CNBC Series G coverage
Apr 2026~$30B (reported)Bloomberg, Axios

Now look at the matching P&L line. Per The Information, Anthropic revised its 2025 gross margin forecast down from 50 percent to about 40 percent, with the drop driven by inference costs coming in 23 percent higher than plan. The 2025 EBITDA loss was around $5.2 billion. Strip the branding off that sentence: the company's own books say they sold 23 percent more inference than they expected to have to serve, at a price that didn't cover the incremental GPU hours.

That's the fingerprint of demand winning faster than provisioned capacity. The status page tells the story once. The P&L tells it a second time. The 23 percent overrun in inference cost is the exact number that makes the rest of this article make sense.

The Agent Tax: Coding Sessions Are the Hardest Shape of Load

Here's the framework worth carrying forward. When you serve a chat workload (ChatGPT on the web, a quick Q&A with an assistant) the average request is short, stateless, and predictable. The model prefills a few hundred to a few thousand tokens and decodes a few hundred more. Batcher fills up fast. Straggler tails are short. GPU utilization is high.

Agentic coding is a different animal. A Claude Code session reads your repo, plans a change, runs tools, waits for test output, reads the failure, plans again, writes more code, runs tests again, and continues like that for ten, twenty, sometimes forty minutes. Every tool-call turn appends to context. Research from Sean Goedecke has the cleanest explanation of why this is painful for a GPU:

"If, whenever someone got on a bus, the bus departed immediately, commutes would be much faster for people on it. But overall throughput would be much lower."

That's what priority tiers do to inference. You pay something like six times the compute for two and a half times the latency by shrinking batch size. Long-context agent sessions have the same effect structurally, without asking. A 100,000-token KV cache takes roughly two and a half times more memory bandwidth per generated token than a 40,000-token one. More memory per session means fewer sessions fit in HBM at once. Smaller batch means lower throughput. Lower throughput with bursty 10-minute sessions holding GPU slots means straggler tails that make your p99 latency look like it was written by a random number generator.

Layer prompt caching on top. Anthropic's economics only work because long contexts get cached: cache reads cost 0.1x the base input token rate per the Anthropic prompt-caching docs. But the cache invalidates in hierarchy order, Tools then System then Messages, and changes higher in the hierarchy nuke everything downstream. Edit one token in your system prompt and a whole session's cache evaporates into a full re-prefill, which is compute-heavy. Claude Code auto-compaction (triggered around 75 to 92 percent context fill) has the same effect: it summarizes, drops originals, keeps going, and every compaction is a fresh prefill on the summary.

There is a real, named, acknowledged bug on this: claude-code issue #29230, "Server-side KV cache stale context regression (P1)." This is not me speculating about failure modes. This is the operator, in their own issue tracker, agreeing that the cache layer has bugs at scale.

So here's the Agent Tax in one line. The workload that made Claude the default (long-running Sonnet coding agents with huge KV caches and heavy cache writes) is structurally the hardest workload to serve. Product-market fit and serving pain live at the same address. You can't have one without the other. OpenAI's business is tilted more toward ChatGPT consumer chat, which is cheaper per token to serve, less bursty, and easier to batch. Claude's business is tilted toward the hardest token you can generate. That is a large part of why a 12-hour Opus/Sonnet timeout event shows up on the Anthropic status page and doesn't show up on OpenAI's.

The Rate Limit Is a Status Page in a Suit

Not everything Anthropic does in response to capacity pressure shows up as an incident. Some of it shows up as a policy change.

On July 28, 2025, TechCrunch reported Anthropic was rolling out weekly rate limits on top of the existing 5-hour rolling windows for Claude Pro and Max. Effective August 28, the Max-$200 tier capped at "240 to 480 hours of Sonnet 4 and 24 to 40 hours of Opus 4" per week, per Anthropic's own announcement. The range itself is the tell. It's not a policy number. It's a capacity float dressed as a policy.

Operators on the Anthropic API also noticed the error codes quietly shift. The old 529 overloaded_error, which explicitly says "we're overloaded," got deprioritized in favor of the more generic 429 rate_limit_error. The 529 is honest. The 429 is political. Same capacity pressure, different wire protocol.

Then the metering itself broke. In late March 2026, The Register reported that Max subscribers were draining their 5-hour quota in as little as 19 minutes. Anthropic acknowledged the bug publicly. A lot of what dev Twitter called "Claude is down again" that week was actually a billing counter running too fast.

This is the part of the story the "outages equal compute scarcity" thesis gets wrong, and the part the adversarial read of the situation gets right. If the rate limiter is broken, the user experience is indistinguishable from an outage. But the fix is a one-line patch, not a gigawatt of TPUs. The lesson is that "capacity" at Anthropic is a spectrum: raw GPU availability on one end, policy throttles that protect the fleet in the middle, metering bugs that drain quota on the other end. The user experience at all three ends is the same error message.

The Capacity Pile Arrives After the Window That Matters

Anthropic has stacked the largest compute pile of any AI lab that isn't a cloud hyperscaler itself. Here's the inventory.

CounterpartyAnnouncedScaleGo-livePrimary workload
AWS Trainium, Project Rainier (Indiana)$4B → $8B, 2024-25~500K Trainium2 live, scaling past 1M; 2.2 GW campusActivated Oct 29, 2025Training
Google Cloud TPU (Ironwood v7, Trillium)Oct 23, 2025Up to 1M TPUs, >1 GW in 2026, ~3.5 GW by 2027Staged through 2026-27Inference (primary)
Nvidia + Microsoft AzureNov 18, 2025$30B Azure commit; Nvidia invests $10B; 1 GW initiallyStaged 2026+Mixed; Claude distribution on Azure

The thing to notice is the workload column. Per SemiAnalysis, AWS Rainier is explicitly "for the sole purpose of serving Anthropic's training needs," while Google's TPU fleet carries the bulk of production Claude inference. Training and inference live on different silicon by design. That means Project Rainier switching on in October 2025 didn't add inference headroom. It freed up Google TPU time that had been getting cannibalized by training runs. The fix for user-visible latency arrives only as the Google TPU tranche fills in through 2026, and doesn't fully ramp until the 3.5-gigawatt Broadcom cohort in 2027.

Three awkward facts about the pile. First, most of the announced capacity is paper. The AWS Indiana campus is real and live, but the Google 3.5 GW scale-up is 2027. The Nvidia-Azure tranche is staged. Second, competitors are scaling faster. OpenAI's Stargate buildout is targeting 10+ gigawatts, Meta's Hyperion is 5 gigawatts, xAI Colossus 2 is a live gigawatt datacenter already. In relative terms, Anthropic's 2025 live compute is still the smallest of the frontier four. Third, a lot of this is training capacity, and inference bottlenecks (KV cache memory per session, queuing, p99 latency under agent load) don't automatically get fixed by more training floor space.

The sharpest version of this: you can't buy your way out of an Agent Tax with training clusters. The 2027 TPUs land after the window in which Claude's reliability perception gets set in enterprise procurement.

The Two-Customer Problem

The uncomfortable part of the demand story is how concentrated it is. Per VentureBeat's reporting, roughly half of Anthropic's API revenue comes from two customers: Cursor and GitHub Copilot. Coding apps as a category drove something like $1.2 billion of the $4 billion ARR checkpoint.

Both of those customers have strong reasons to diversify. Cursor CEO Michael Truell is publicly pushing the product toward model-agnosticism, per Fortune, in part because GPT-5 and Gemini 3 are now benchmark-competitive on SWE-bench (Opus 4.6 at 80.8 percent, GPT-5.4 at roughly 80 percent, Gemini 3.1 Pro at 80.6 percent per the SWE-bench Verified leaderboard). GitHub Copilot is owned by Microsoft, whose strategic partner is OpenAI, and which now pipes multiple frontier models including Claude Sonnet 4.6 (per GitHub's changelog). The optionality is built in.

So the "developer mindshare moat" has to be read carefully. It is real: Anthropic is the preferred coding model today. 73 percent of first-time enterprise AI dollars is a staggering number. But it's also routed through two intermediaries who are structurally incentivized to dilute that routing. If reliability gets bad enough that Cursor starts defaulting new users to Gemini 3, the mindshare story breaks faster than the capacity story fixes. That's the fragility inside the flywheel.

Anthropic's moat is two procurement relationships in business-casual dress. That's a real moat until one of them moves.

The Fail-Whale Era, and What Twitter, S3, and ChatGPT Teach Us

Zoom out. Every platform that mattered has had a 12 to 36 month stretch where the infrastructure couldn't keep up with demand and the power users revolted. Twitter's Fail Whale era ran roughly 2008 to 2010. The service lost six full days of uptime in 2007 and spent most of 2008 and 2009 in "over capacity" mode while the engineering team rewrote the backend off Rails onto JVM languages. Twitter IPO'd at a $31 billion valuation in 2013. The Fail Whale is a footnote.

AWS S3's February 28, 2017 outage took a meaningful chunk of the internet offline for four hours because a fat-fingered command killed the index subsystem. It did not cause customers to flee AWS. It caused multi-region architectures to become table stakes. AWS kept growing.

ChatGPT's own capacity crisis in late 2022 and early 2023 had users staring at "at capacity" pages for weeks because OpenAI literally couldn't provision Nvidia H100s fast enough. The crisis did not slow adoption. It accelerated the narrative that the service was must-have.

The pattern is consistent. A scale-surge reliability era lasts 12 to 36 months. It feels existential to power users while it's happening. A year after it ends, retention data shows no scar, provided the company actually invested in the rewrite. Anthropic's March and April 2026 incident density puts them squarely in month 4 to 6 of a classic version of this era. The determining question is not whether the outages are real (they are) or whether peers are cleaner (OpenAI is modestly cleaner, Gemini on Vertex is cleaner but lower volume). The question is whether Anthropic treats the next 18 months as a compute-and-architecture rewrite moment the way Twitter treated 2010.

The early evidence is that they do. The Managed Agents architecture and the effective-harnesses-for-long-running-agents post are both, functionally, admissions that the prior serving architecture was burning GPU slots on idle agent sessions. Decoupling the "brain" (the model doing inference) from the "hands" (the sandbox executing tool calls) is not a minor optimization. It's a rewrite of how an agent holds capacity. That's the Twitter-moving-off-Rails move, compressed into engineering-blog English.

The Take

Putting it all together. Claude is down more than its peers right now, the outages cluster during US working hours, and the causes are a mix that the "out of GPUs" meme gets mostly wrong. The real pattern is this. Anthropic wins the workload that is hardest to serve. It runs that workload on three silicon vendors whose compilers don't agree with each other. Its revenue compounded from $1 billion to roughly $30 billion in 15 months, and the gross margin tells you inference ran 23 percent over plan because demand outran provisioned serving. The 3.5 gigawatt Google TPU expansion and the $8 billion Amazon Trainium campus do real work, but most of the new concrete lands after 2026, and a lot of it is training, not inference. In the meantime the rate limiter picks up the slack, and when the rate limiter breaks it looks exactly like an outage.

Two frameworks to take with you. The Agent Tax: the product decisions that made Claude the preferred coding model (long contexts, Sonnet as the default, aggressive prompt caching, 10-minute agent sessions) also make it the single hardest model to serve reliably at scale. Product-market fit and serving pain live at the same address, and you can't buy your way out of one without losing the other. The Fail-Whale Era: every platform that mattered paid a 12 to 36 month reliability bill during its scale surge, and a year after it ended nobody remembered. Anthropic is in month six. The question is whether the next 18 months look like a rewrite or like excuses.

What I'd watch from here. Claude's p99 latency and core-inference availability on the status page, week over week, because that's the honest receipt. The gross margin trajectory in the next Anthropic disclosure, because if inference cost overruns shrink from 23 percent to single digits, the supply side is catching up. The Cursor and GitHub Copilot routing mix, because a 10-point shift toward Gemini 3 or GPT-5 is the canary for the concentration problem. And the pace of Managed Agents adoption, because that's the architectural lever that decouples serving from agent idle time.

The uncomfortable truth is that Claude being unreliable right now is in part proof Anthropic is winning. You don't get outages during US working hours without US users during US working hours. What breaks from here isn't the model. It's whether the rewrite lands before the procurement cycle does. The AI business in 2026 is being run on infrastructure built for the AI business in 2024, and the receipts show up in the hour-by-hour incident log, the quarterly margin disclosure, and the customer concentration footnote. Same story told three times. If you're betting on which AI lab compounds from here, that's the trade to size correctly. Not whether Claude is down. Whether Anthropic is building the plumbing fast enough to outgrow the fail whale (and the adjacent question of whether the rest of the compute stack can scale with it, which we covered in our AI infrastructure roadmap).

Written by

Tech Talk News Editorial

Tech Talk News covers engineering, AI, and tech investing for people who build and invest in technology.

ShareXLinkedInRedditEmail