Microsoft's VALL-E can imitate any voice in just 3 seconds
Shortpedia
Content TeamImage Credit: firstpost
Three seconds, that's all it takes for Microsoft's newly developed text-to-speech AI model to mimic a person's voice. Dubbed VALL-E, it can generate audio of a person saying anything once it learns a specific voice. The AI's ability in mimicking voices has caught everyone by surprise. It is trained on over 60,00 hours of English speaking, much more than other text-to-speech models.