Along with new agentic AI workflows, the new model is already powering faster localization for dozens of Deepdub GO enterprise customersÂ
Deepdub, a foundational voice AI company pioneering expressive voice technologies, announced the launch of its latest AI speech model Phantom X 3.2, designed to redefine the standards of dubbing and real-time voice agents. With enhanced voice quality, multilingual capabilities, and ultra-low latency, Phantom X 3.2 is built to meet the growing demands of global enterprises for scalable, high-quality AI voice and dubbing solutions. Additionally, Deepdub’s new agentic AI workflows will be demoed at the upcoming NVIDIA GTC, showcasing the future of AI-powered localization.
Deepdub GO, the company’s enterprise localization platform purpose-built for localization at scale, is now powered by Phantom X 3.2. GO continues to serve as the backbone of Deepdub’s enterprise offering, enabling production teams to generate, review, and deploy AI dubbing across dozens of languages within high-volume localization pipelines. With GO, Deepdub’s strategic partners have uninterrupted and complete access to the world’s most advanced AI-powered localization platform, including Phantom X 3.2 and all new foundation models and agentic capabilities as they are introduced.
Marketing Technology News:Â MarTech Interview with Haley Trost, Group Product Marketing Manager @ Braze
Phantom X 3.2Â introduces a new generation of dubbing capabilities engineered for studio-grade quality at enterprise scale. The model produces professional-quality voice output with human-like delivery across extreme pitch, speed, and prosody ranges, and supports zero-shot voice cloning from as little as one second of reference audio, even from noisy or degraded source material. Expanded emotion styles, including Joy, Giggle, and Laughter, can be layered within a single line, and a new Key Names and Phrases (KNP) system ensures consistent pronunciation and translation of recurring character names and technical terms across full episodes and series.
The model’s precision phonetics for stress-timed languages ensures perfect pronunciation in languages where stress impacts meaning, such as Russian, Hebrew, and other languages in which incorrectly applied stress alters the meaning of the word. This ensures, for example, words like “zamok” (castle vs. lock) or “BI-ra” (beer vs. capital city) are correctly pronounced. This makes it an essential tool for global enterprises aiming to localize content accurately.
Phantom X 3.2 enables streaming platforms and studios to localize series into 10–20 languages simultaneously while maintaining consistent character voices, accurate pronunciation of names and terms, and natural performance across episodes. The model also supports animation and franchise localization, large catalogue dubbing of films and television libraries, fast-turnaround localization for trailers, promos, and global releases, and natural narration for documentaries and unscripted programming.
For real-time voice agents, Phantom X 3.2 delivers approximately 125ms end-to-end latency, making it suited for demanding voice agent use cases such as customer support, virtual assistants, and interactive AI pipelines. Speech generation begins as text arrives, processing the remainder of each sentence in parallel to enable natural, uninterrupted real-time conversations. The model also maintains consistent voice identity, emotional control, and audio quality across extended interactions, with automatic speaker gender detection that persists throughout a session.
Marketing Technology News:Â Cross-Department Collaboration with Marketing Workflow Automation: Enhancing Alignment Between Sales, Customer Service, and Marketing Teams
“The demands on voice AI have never been more complex or more consequential,” said Ofir Krakowski, CEO and co-founder of Deepdub. “Content owners and global enterprises need every language to feel native, and every conversation to feel human. But beyond quality, the economics of localization are being rewritten — streaming platforms can now make on-demand localization decisions as content breaks through in a new market, without pre-committing budgets to languages that may never be needed. With Phantom X 3.2, we’ve built a model that meets every bar simultaneously – Hollywood-grade expressiveness, real-time responsiveness, and the unit economics that make agile, language-by-language expansion a real business decision rather than a gamble. And this is just the beginning. We’re continuing to push the boundaries of what’s possible in dubbing and localization, with agentic AI workflows that will further automate and orchestrate pipelines end-to-end, making world-class localization faster, smarter, and more accessible than ever before.”










