OpenAI launches GPT-Realtime-2: brings GPT-5 reasoning into voice agents, with context up to 128K

ChainNewsAbmedia

On May 7 (U.S. time), OpenAI unveiled three new Realtime speech models at its developer conference: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, all available to developers via the Realtime API. In an official announcement, OpenAI said GPT-Realtime-2 is its first speech model with GPT-5–level reasoning capability, able to perform real-time inference in voice conversations, call tools, handle revisions, and maintain a natural conversational cadence.

GPT-Realtime-2: context up from 32K to 128K, five-stage reasoning strength adjustable

Core upgrades for GPT-Realtime-2:

context window: 32K to 128K tokens

Adjustable reasoning strength: minimal, low, medium, high, xhigh (five stages)

Big Bench Audio test: high reasoning at 96.6%, versus the prior GPT-Realtime-1.5 at 81.4%

Audio MultiChallenge instruction following: xhigh reasoning at 48.5%, versus the prior 34.7%

A larger context and adjustable reasoning strength let developers switch between “cheap and fast” and “deep thinking” based on the scenario—simple customer service can use minimal mode to control costs, while complex tasks can switch to xhigh for GPT-5–level reasoning quality.

Two specialized models are released alongside it: Translate for cross-language, and Whisper for real-time transcription.

This round’s three new models are split by function:

GPT-Realtime-Translate: real-time multilingual voice translation, supports 70 input languages and 13 output languages

GPT-Realtime-Whisper: low-latency streaming transcription, outputs text as people speak, suitable for live captions, meeting notes, and classroom verbatim transcripts

GPT-Realtime-2: a full dialogue agent—able to reason, use tools, and carry out actions

Translate and Whisper are specialized for specific speech applications—translation and transcription are more latency- and cost-sensitive than general dialogue, so using separate models can optimize their respective metrics.

Pricing: GPT-Realtime-2 is $32 per million input, and $64 per million output

Pricing structure for the three models:

GPT-Realtime-2: $32 per million voice input, cached input $0.40, output $64 per million

GPT-Realtime-Translate: $0.034 per minute

GPT-Realtime-Whisper: $0.017 per minute

Specific follow-up events to watch: GPT-Realtime-2’s real-world adoption of speech agents in production environments, the extent of cannibalization versus existing GPT-4o speech models, and how peers like Anthropic and Google respond in comparison.

This article, “OpenAI brings GPT-Realtime-2: brings GPT-5 reasoning into voice agents, context upgraded to 128K,” first appeared on LianNews ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments