Neural network heard the voices of people and painted their portraits

Recently, neural networks have been surprisinglyWith skills - could you believe ten years ago that a computer can “revive” portraits of Dostoevsky and Marilyn Monroe? Get ready to be surprised further, because researchers from the Massachusetts Institute of Technology have created the Speech2Face neural network, which is able to draw portraits of people just by listening to their voices. The technology is still far from ideal, but its ability to determine the gender, nationality and age of a person is impressive.

For training the neural network, a set ofAVSpeech with millions of short videos with thousands of talking people. The tracks with video and sound are separated, so the system was able to study each type of material in as much detail as possible. At the first stage of work, the VGG-Face algorithm studied video fragments and created portraits of people appearing on them in full face and neutral facial expression. Another part of the algorithm studied the spectrogram of the voice and superimposed additional changes on the received portraits - as a result, we got an approximate portrait of each person talking.

The neural network for creating voice-based portraits is already a reality

If you compare a person's face with a video andproposed by the algorithm option, you can find many differences. However, the researchers claim that they initially did not want to create the most similar portrait of a person - many factors influence the tone and intonation of the human voice, so they would not get the perfect result. But the neural network copes with the fact that it is important for researchers, namely with the exact definition of gender, nationality and age.

The authors of the work noted that at the momentThe algorithm is weak in determining age, but they have the power to increase accuracy. It was also found that the algorithm better recreates people with European and Asian appearance, but this is due only to the fact that the training videos were not equal to the number of people of different nationalities.

Why do you need a neural network?

How can this technology be useful inthe future? As an option, with the help of it sometime a service can be created, where a user's virtual avatar is created automatically, based on his voice. New research also has great scientific benefits - having studied the data, scientists can find the relationship between a person's appearance and his voice. You can listen to the voices and look at the portraits recreated on their basis on the project website.

What application of such a neural network can you think of? Share your bold assumptions in the comments and join the conversation in our Telegram chat.