DeepSeek and Xiaomi Slash API Pricing While American Labs Increase Rates
DeepSeek made its 75% discount on DeepSeek V4-Pro permanent on May 22, 2026, locking output pricing at $0.87 per million tokens. Xiaomi followed on May 26, cutting MiMo-V2.5 API prices by up to 99% for cached inputs, with the Pro model now at $0.0036 per million tokens for cached input. The pricing reductions stem from technical optimizations in inference frameworks and KV cache architecture. These cuts arrived as OpenAI doubled GPT-5.5 output prices to $30 per million tokens at launch in late April, and Anthropic shipped Claude Opus 4.7 with a new tokenizer that produces up to 35% more tokens for identical input text, potentially inflating actual costs despite unchanged rate cards.
Permanent Pricing Changes Announced
DeepSeek V4-Pro now runs at $0.435 input and $0.87 output per million tokens. The 75% discount, previously set to expire, became permanent earlier this week. Xiaomi's MiMo-V2.5-Pro matches the same $0.435/$0.87 per million tokens after the May 26 cuts. Cache hits for MiMo-V2.5 dropped to $0.0036 per million tokens. Xiaomi's billing upgrade gives users 5 to 8 times more tokens at the same price. The Max plan at $100 now provides 82 billion tokens, up from 1.6 billion.
Technical Implementation Behind Price Reductions
Fuli Luo, head of Xiaomi's MiMo team and former core DeepSeek developer who co-built DeepSeek-V2, published a technical explanation on X on May 27. The inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity approximately five times. The system reduces storage and processing costs by around 80%. "Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even," Luo wrote.
DeepSeek V4 uses two interleaved attention types: one compressing every four tokens for selective attention, another collapsing every 128 tokens for global context. At one million tokens of context, V4-Pro's KV cache is 10% the size of its predecessor's. Single-token inference runs at 27% of the previous compute cost.
Performance Benchmarks and Comparative Pricing
DeepSeek V4-Pro scored 80.6% on SWE-Verified. Claude Opus 4.6 scored 80.8% on the same benchmark measuring real GitHub issue resolution. The pricing gap between the two models: 34x on output. DeepSeek V4-Pro is a 1.6 trillion parameter model.
Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. GPT-5.5 runs at $30 per million output tokens, double its predecessor's rate. Gemini 2.5 Pro sits at $1.25 input and $10 output per million tokens.
MiniMax M2.7 costs $0.30 input and $1.20 output per million tokens. Kimi K2.5 from Moonshot AI, with 76.8% on SWE-bench Verified, runs $0.60 input and $2.50 output. GLM-5.1 from Z.AI beat Claude Opus 4.6 on a coding benchmark in Q2 2026. Four Chinese frontier models shipped in a 12-day window in early May, all under one-third of Opus 4.7's per-token cost. DeepSeek V4-Pro's cost for cached input tokens is $0.003625 per million tokens.
Market Positioning Across Providers
The Q2 2026 pricing gap between Chinese and American frontier models ranges from 15x to 30x, depending on model comparison. This baseline exists before cache discounts. Anthropic kept Claude Opus 4.7's rate card flat but shipped it with a new tokenizer that can produce up to 35% more tokens for the same input text.