According to Beating, OpenAI released three voice models in its Realtime API: GPT-Realtime-2 for voice conversation with reasoning, GPT-Realtime-Translate for real-time translation, and GPT-Realtime-Whisper for streaming transcription. GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-level reasoning capability, expanding context window from 32K to 128K tokens, supporting up to 1-2 hours of dense conversation.
GPT-Realtime-2 improved 15.2% on Big Bench Audio benchmark and 13.8% on Audio MultiChallenge compared to GPT-Realtime-1.5. GPT-Realtime-Translate supports 70+ input languages translating to 13 output languages. Pricing: GPT-Realtime-2 at $32/million input tokens and $64/million output tokens; Translate at $0.034/minute; Whisper at $0.017/minute.
Related News
NVIDIA releases Nemotron 3 Nano Omni open-source multimodal model
OpenAI launches ChatGPT Futures: 26 inaugural students receive $10k in funding, spanning more than 20 universities
OpenAI Unveils the MRC Supercomputer Network Protocol! Teaming Up with NVIDIA, AMD, and Microsoft to Build the Stargate Infrastructure