Russian specialists from the Center for ArtificialIntelligence Samsung AI Center-Moscow in collaboration with engineers from the Skolkovo Institute of Science and Technology developed a system capable of creating realistic animated images of people's faces based on just a few static frames of a person. Usually, in this case, the use of large databases of images is required, but in the example presented by the developers, the system was trained to create an animated image of a person’s face from just eight static frames, and in some cases one was enough. More details on the development are reported in an article published in the online repository of ArXiv.org.
As a rule, reproduce photo-realisticThe personalized module of a person’s face is rather difficult due to the high photometric, geometric and kinematic complexity of the reproduction of a human head. This is explained not only by the complexity of modeling the face as a whole (there are a large number of modeling approaches for this), but also by the complexity of modeling certain features: oral cavity, hair, and so on. The second complicating factor is our predisposition to catch even minor flaws in the finished model of human heads. This low tolerance for modeling errors explains the current prevalence of non-photorealistic avatars used in newsgroups.
According to the authors, the system, calledFewshot learning is capable of creating very realistic models of talking heads of people and even portrait pictures. The algorithms produce a synthesis of the image of the head of the same person with the lines of the landmark face, taken from another video fragment, or using landmarks of the face of another person. Developers used an extensive celebrity video database as a source of training material for the system. To get the most accurate “talking head”, the system needs to use more than 32 images.
To create more realistic animatedface images, developers used previous developments in generative-competitive modeling (GAN, where the neural network thinks about image details, actually becoming an artist), as well as a machine meta-learning approach, where each element of the system is trained and designed to solve a particular problem.
To handle static images of people's heads andturning them into animated three neural networks were used: Embedder (implementation network), Generator (generation network) and Discriminator (discriminator network). The first one separates the images of the head (with approximate facial landmarks) into embedding vectors that contain posture-independent information, the second network uses the facial orientations obtained by the network and generates new data based on them through a set of convolutional layers that provide stability to changes in scale, shifts, turns, change of angle and other distortions of the original image of the face. A network discriminator is used to assess the quality and authenticity of the two other networks. As a result, the system turns the landmarks of a person’s face into realistic-looking personalized photos.
The developers emphasize that their systemis able to initialize the parameters of both the generator network and the discriminator network individually for each person in the picture, so the learning process can be based on just a few images, which increases its speed, despite the need to select tens of millions of parameters.
You can discuss the news in our Telegram-chat.