According to Beating, Google Gemini 3.1 Flash-Lite transitioned from preview to general availability (GA) on May 8, becoming the cheapest and fastest model in the Gemini 3 series. Input pricing is set at $0.25 per million tokens and output at $1.50 per million tokens—input costs 75% less than Claude 4.5 Haiku ($1.00) and output 70% less ($5.00). The model features a 1 million token context window and achieves 363 tokens per second throughput, 45% faster than its predecessor Gemini 2.5 Flash.
Performance benchmarks show GPQA Diamond (graduate-level science reasoning) at 86.9%, surpassing Claude 4.5 Haiku’s 73.0% and GPT-5 mini’s 82.3%. MMMU-Pro (multimodal reasoning) reaches 76.8%. Early adopters include customer service platform Gladly, which reports 60% cost reduction and 99.6% success rate on production workloads, and JetBrains, integrating Flash-Lite into IDE assistance tools.
Related News