News

VibeVoice is a new open-source AI tool that can generate a full 90 minute audio podcast recording with multiple speakers from ...
"VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as ...
Text-to-speech models from ElevenLabs, Hume AI, and Descript are all pushing the limits of AI-generated voice technology.
Microsoft has launched VibeVoice, a new open-source AI model capable of generating up to 90 minutes of multi-speaker audio ...
Text-to-speech with feeling - this new AI model does everything but shed a tear ElevenLabs' 'most expressive' v3 model can speak with a huge range of emotions in more than 70 languages.
In its latest blog, Microsoft announced the launch of a new speech generation AI model, MAI-Voice-1, for Copilot and Podcasts features.
Capturing natural conversations Rime’s model generates audio tokens that are decoded into speech using a codec-based approach, which Rime says provides for “faster-than-real-time synthesis.” ...
OpenAI's Realtime API is now generally available, featuring the new gpt-realtime model for more natural voice agents at a 20% lower cost for developers.
Google Docs introduces Gemini audio, a text-to-speech feature allowing users to listen to documents. Available on the web for AI Pro and Ultra subscribers, it includes natural-sounding voices, ...
From August 11 to August 15, a new model will be unveiled each day, covering cutting-edge models for multimodal AI scenarios. On August 11, Skywork officially launched the SkyReels-A3 model.
So why do today’s AI systems struggle with natural speech? The problem isn’t just about technology; it’s about the way these ...