DeepL, recognized for its text translation capabilities, is now aiming to translate your voice.

DeepL, recognized for its text translation capabilities, is now aiming to translate your voice.

DeepL, a translation firm primarily recognized for its text-based tools, introduced a voice-to-voice translation suite today, catering to applications such as meetings, mobile and online conversations, and group discussions for frontline personnel via customized applications. The company is also launching an API that enables external developers and businesses to leverage DeepL’s technology for tailored use cases, including call centers.

“Having dedicated so many years to text translation, moving to voice felt like a logical evolution for us,” DeepL CEO Jarek Kutylowski stated in an interview with TechCrunch. “We have made significant progress in both text translation and document translation. However, we sensed there was a gap in the market for effective real-time voice translation.”

Kutylowski mentioned that the difficulties in developing a real-time translation tool revolve around finding the right balance between minimizing latency — the interval between a person speaking and the translated audio being played — and ensuring accuracy in the results.

DeepL is introducing extensions for platforms like Zoom and Microsoft Teams, where participants can either receive real-time translation while others speak in their native tongues or view real-time translated text on their screens. This initiative is currently in early access, with the company encouraging organizations to sign up for a waitlist. Additionally, the company offers a solution for mobile and web-based conversations that can occur either in person or remotely.

DeepL also allows users to engage in a group conversation in environments such as training sessions or workshops, enabling attendees to join via a QR code.

DeepL announced that its voice-to-voice technology is capable of learning and adapting to specific vocabulary, including industry-related terms and names of companies and individuals.

Kutylowski remarked that AI is transforming the future landscape of customer service. He pointed out that a translation component assists companies in providing support in languages where qualified personnel are rare and costly to recruit.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

The company stated that it oversees the complete voice-to-voice stack. However, the existing system transcribes speech to text, executes translation, and then converts it back to speech. DeepL believes that, due to its long-standing focus on text translation, it possesses an advantage in translation quality. Moving forward, the company aims to create an end-to-end voice translation model that eliminates the text stage altogether.

DeepL is contending with several well-funded startups operating in related sectors. Sanas, which raised $65 million last year from Quadrille Capital and Teleperformance, utilizes AI to alter a speaker’s accent in real time — primarily targeting call center representatives.

Dubai-based Camb.AI specializes in speech synthesis and translation for media and entertainment firms such as Amazon Web Services, assisting them in dubbing and localizing video content on a large scale.

Palabra, supported by Reddit co-founder Alexis Ohanian’s firm Seven Seven Six, is developing a real-time speech translation system designed to maintain both the meaning and the speaker’s original voice, placing it in closer competition with what DeepL is now creating.

Leave a Reply