63% of AI Chip Costs Go to Memory: The Real Bottleneck Has Shifted

Karify98 & Amy 🌸·
Cover Image for 63% of AI Chip Costs Go to Memory: The Real Bottleneck Has Shifted

For the past two years, the AI race centered on GPU compute. Epoch AI just published data that tells a different story: 63% of AI chip component costs now go to memory, not the processor. This is the clearest signal yet about where the real bottleneck in AI infrastructure has moved.

The Numbers That Changed the Narrative

Epoch AI's analysis shows HBM (High-Bandwidth Memory) cost share in AI chips rose from 52% to 63% over 18 months (Q1 2024 – Q4 2025). During the same period, packaging costs fell from 19% to 15%, and auxiliary components from 15% to 9%. Logic die share barely moved, staying at 13–14%.

Total AI chip component spending grew from $22 billion to $52 billion, but HBM alone accounted for $20 billion of that increase.

The NVIDIA B200 makes this concrete. Manufacturing cost runs about $6,400, with $3,200 going to HBM alone β€” 192 GB of HBM3E at roughly $15 per gigabyte. More than half the physical cost is memory. When users pay $30,000–$40,000 to lease a B200, most of that money flows to memory, not compute.

The traditional model β€” a processor with memory attached β€” is inverting. AI chips are becoming memory devices with a processor attached.

Supply: Sold Out Through 2027

Three companies make HBM at volume: SK Hynix (62% market share), Samsung, and Micron. All three are sold out through the end of 2026.

SK Hynix confirmed its entire 2026 supply is committed. Micron said the same. Samsung is raising HBM contract prices by 15–20% for 2026 agreements. The top four AI chip designers β€” Nvidia, Google, AMD, Amazon β€” consumed more than 90% of global HBM supply in 2025, according to Epoch AI.

OpenAI COO Brad Lightcap stated it directly at the Hill and Valley Forum in March: "Right now, it's memory." The binding constraint used to be energy for data centers, then GPU compute. Now it's the physical availability of HBM chips.

New capacity from SK Hynix (Korea), Micron (Singapore), and Samsung (Pyeongtaek) won't reach meaningful volume before late 2027. That 18-month gap puts pressure on the entire supply chain.

How It Hits Developers

The hyperscalers β€” AWS, Google Cloud, Azure, Meta β€” signed multi-year HBM contracts before the shortage. They locked in 2024-era pricing while the rest of the market pays spot rates or can't get supply at all.

The impact ripples outward: DDR5 96GB kits jumped from $280 to over $1,000 as manufacturers shift capacity toward higher-margin HBM. OVH Cloud announced 5–10% GPU price increases for April–September 2026.

For developers, the HBM shortage turns software architecture into a financial decision.

What Developers Can Do

Quantization (INT8 or INT4) cuts HBM footprint per inference roughly in half with minimal quality loss on most production workloads.

Mixture-of-Experts (MoE) models β€” DeepSeek V4, Mixtral β€” activate only a fraction of parameters per token, dramatically reducing active memory bandwidth. They're memory-efficient by design, not by coincidence.

Smaller models on older GPUs with HBM2E is a viable bridge while HBM3E stays constrained.

API-first β€” consume AI through cloud endpoints and let hyperscalers absorb the infrastructure cost. In a market where HBM is rationed, building your own GPU cluster is a bet against the supply chain.

The Strategic Shift

The GPU compute race is largely over β€” supply is loosening, costs are dropping, alternatives are multiplying. Memory is where the next two years get decided.

The current situation echoes the 2021–2022 GPU crisis, but this time the constraint is memory bandwidth, not GPU compute. The 18-month gap before new capacity comes online means developers need to plan now.

Seymour Cray once said: "If you're forced to choose, pair fast memory with a slow processor." That philosophy is coming back β€” at a much larger scale.

The strategic question is no longer "which GPU?" but "how do you optimize memory?" Teams that understand this early will have a real advantage in cost and scalability.


References: