x

Microsoft's VALL-E can imitate any voice in just 3 seconds

Shortpedia

Content Team
Image Credit: firstpost

Three seconds, that's all it takes for Microsoft's newly developed text-to-speech AI model to mimic a person's voice. Dubbed VALL-E, it can generate audio of a person saying anything once it learns a specific voice. The AI's ability in mimicking voices has caught everyone by surprise. It is trained on over 60,00 hours of English speaking, much more than other text-to-speech models.