Microsoft's VALL-E Can Copy Any Human Voice In 3-Seconds

The tech giant Microsoft recently unveiled its brand-new artificial intelligence (AI) technology.

Table of Contents

VALL-E

Named VALL-E, this AI model is said to be able to analyze and replicate people’s voices by only needing a 3-second recording of a certain voice to imitate it. When a certain voice has successfully been simulated, VALL-E can change the intonation to the emotional tone of that person.

Its creators explain that the VALL-E was trained with 60,000 hours of English speech from over 7,000 different speakers in an effort to get it to produce human-sounding speech.

Copy any voice in 3-seconds

Researchers who developed VALL-E explained that this technology could be used in text-to-speech applications to produce voices based on prepared transcripts. So, its user only needs to prepare a script containing whatever he wants to convey without having to record his voice manually. It is considered quite helpful for activities such as editing speeches and creating audio content.

VALL-E can be dangerous

Although the potential for artificial intelligence may vary, many people are concerned about the negative impact that VALL-E could possibly bring up. VALL-E carries the risk of abuse which could criminalize its users, such as if the technology is used to fake voice identification or impersonate certain speakers.

For example, this AI was used to fake the voice of a famous person saying something they had never said. This type of case has been found in deepfakes in video format.

In addition, it is almost similar to Lensa AI, which was recently considered to violate art ethics because it is feared that it can replace human artists by making digital art.

Music producers using it to copy the voices of singers

VALL-E also raises ethical questions, such as when AI is used by music production companies to sing new songs without the consent of the singer owning the voice.

Realizing the concerns and possible risks that the robot might pose, Microsoft, as the developer, only publishes the capabilities and workings of the AI model without sharing the source code.

VALL-E’s researchers say they will try to build a measurement mechanism that can prevent such risks, such as a method that can detect an audio clip engineered by VALL-E or the original sound.

What is Microsoft VALL-E?

The Microsoft VALL-E model is designed to understand natural language.

How does Microsoft VALL-E work?

In order to understand natural language, Microsoft VALL-E uses deep learning techniques.

What are the capabilities of Microsoft VALL-E?

Using natural language input, Microsoft VALL-E can complete text entries, answer questions, and converse naturally.

What are the applications of Microsoft VALL-E?

In addition to chatbots and language-based interfaces, Microsoft VALL-E can also be used in customer service applications.

Is Microsoft VALL-E open-sourced?

T5 and GPT-3 are other Microsoft language models open-sourced. Microsoft VALL-E availability may have changed. You act them directly or check the official Microsoft website.