Google debuts eighth-gen TPUs 8t and 8i to rival Nvidia
Google introduced TPU 8t for training, scaling to 9,600 chips and 121 ExaFlops, and TPU 8i for inference with 384 MB on-chip SRAM and 288 GB HBM.
At the Google Cloud Next conference in Las Vegas on Wednesday, Google unveiled the eighth generation of its custom Tensor Processing Units: the TPU 8t for model training and the TPU 8i for inference.
The TPU 8t targets large-scale model training. Google reported a single TPU 8t superpod can scale to 9,600 chips and deliver 121 ExaFlops of compute. The company published a 2.8x improvement in price-to-performance and said per-pod compute is nearly three times that of the prior generation.
The TPU 8i is built for inference and workloads that need fast, memory-heavy operations. On-chip SRAM is listed at 384 MB, three times the previous amount, and the design pairs that with 288 GB of high-bandwidth memory. Google reported the 8i provides up to 80% better performance per dollar and twice the performance per watt compared with the prior inference TPU.
Both processors use a new Boardfly interconnect that reduces network diameter. Google’s technical documentation states Boardfly can cut communication latency by as much as 50% for communication-intensive workloads, an outcome the company indicated would help distributed training and high-concurrency inference.
CEO Sundar Pichai framed the chips as part of an infrastructure push for what he called the ‘agentic era’ and announced plans for $175 billion to $185 billion in capital expenditures this year, compared with $31 billion in 2022.
Google noted an expanded partnership with Anthropic that will provide the startup with multiple gigawatts of next-generation TPU capacity. The company also named Citadel Securities among early commercial users that have selected TPUs for AI workloads.
Google has iterated its TPU designs over several generations to support internal projects and cloud customers. The eighth-generation release is being offered as an alternative to GPU-based cloud compute and to scale both training clusters and inference fleets for agent-style applications.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.







