OpenAI introduces fresh voice intelligence capabilities in its API

On Thursday, OpenAI announced that its API will now feature several new voice intelligence capabilities aimed at assisting developers in building applications that can interact, transcribe, and translate dialogues with users.

The firm’s latest model, GPT‑Realtime‑2, is an additional voice model crafted to produce a lifelike vocal representation that can engage in conversations with users. However, in contrast to its predecessor (GPT-Realtime-1.5), this version employs GPT‑5‑class reasoning, which OpenAI asserts was designed to handle more complex user demands.

Furthermore, the organization is introducing GPT‑Realtime‑Translate, which, as indicated by its name, aims to deliver real-time translation services that “keep pace” with users in conversation. This feature supports over 70 input languages (languages it can understand) and 13 output languages (languages it conveys to the speaker).

Additionally, the company has introduced a new transcription feature, GPT-Realtime-Whisper, providing users with live speech-to-text functionality that captures interactions as they happen.

“Together, the models we are launching shift real-time audio from mere interactive exchanges to voice interfaces capable of performing tasks: listening, reasoning, translating, transcribing, and acting as conversations progress,” stated the company.

Who will benefit from these updates? Clearly, businesses aiming to enhance customer service functions are a primary audience. However, OpenAI also emphasizes that its new features will be valuable across various sectors, including education, media, events, and creative platforms, among others.

While these tools appear advantageous from a business standpoint, there are concerns about potential misuse. The company mentioned that it has implemented safeguards to prevent these new features from being exploited for spam, fraud, or other types of online misconduct. Specific triggers have been integrated into the system so that “conversations can be interrupted if identified as breaching our harmful content guidelines,” as OpenAI stated.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

All new voice models are part of OpenAI’s Realtime API. Translate and Whisper are charged per minute, while GPT-Realtime-2 is billed based on token usage.

When you make a purchase through the links in our articles, we may earn a small commission. This does not impact our editorial independence.