DeepSeek, Xiaomi slash AI API prices up to 99%
DeepSeek made its 75% V4‑Pro discount permanent; Xiaomi cut MiMo‑V2.5 cached‑input prices by up to 99%. OpenAI raised GPT‑5.5 output to $30 per million tokens.
Chinese AI labs DeepSeek and Xiaomi reduced API prices this week while several U.S. providers raised their rates. DeepSeek confirmed on May 22 that a temporary 75% discount on its V4‑Pro model will remain permanent, fixing output at $0.87 per million tokens and input at $0.435 per million tokens. Xiaomi announced on May 26 that MiMo‑V2.5 API rates will fall, with cached inputs in the Pro tier priced as low as $0.0036 per million tokens. Xiaomi’s $100 Max plan now provides 82 billion tokens, up from 1.6 billion.
Xiaomi’s MiMo team published technical notes and a thread on X explaining that a hierarchical key‑value cache optimization for sliding‑window attention increases cached token reuse roughly fivefold and reduces storage and compute requirements by about 80%. Fuli Luo, head of Xiaomi’s MiMo team, wrote on X: “Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even.” Xiaomi said the changes raise token allowances across plans and lower per‑token operating costs for workloads that hit the cache.
DeepSeek described V4‑Pro’s architecture as using two interleaved attention mechanisms-one that compresses groups of four tokens for selective attention and another that collapses every 128 tokens to capture global context. DeepSeek stated that V4‑Pro’s key‑value cache at one million tokens is about 10% the size of its predecessor and that single‑token inference requires roughly 27% of prior compute. The company describes the model as a 1.6 trillion‑parameter system.
U.S. providers moved in the opposite direction on list pricing. OpenAI doubled GPT‑5.5’s output price to $30 per million tokens. Anthropic kept Opus 4.7’s list prices at $5 per million input tokens and $25 per million output tokens but shipped an updated tokenizer that can increase token counts by up to 35% for the same input text. Google’s Gemini 2.5 Pro lists $1.25 per million input tokens and $10 per million output tokens.
Other Chinese models remain priced lower. MiniMax M2.7 lists $0.30 input and $1.20 output per million tokens. Kimi K2.5 lists $0.60 input and $2.50 output per million tokens. Analysts tracking per‑token cost and performance estimate the Q2 2026 baseline price‑to‑quality gap between Chinese and American frontier models at roughly 15× to 30×, depending on the models compared.
The reduced rates most directly affect production workloads that frequently reuse the same context, such as agent pipelines with stable system prompts, document processing, and retrieval tools, where cache hits lower the number of billable tokens. For cached inputs, Xiaomi and DeepSeek’s rates cut the marginal cost of repeated context to fractions of a cent per million tokens. Companies building AI services are billed per token; a token is roughly three‑quarters of a word, so every user message, model reply, or processed document adds chargeable tokens. The current pricing differences create varying cost equations for developers: some providers have higher list prices, while others lower per‑token costs through caching and inference efficiency improvements.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.







