Google introduced a new experimentalThe Translatotron neural network, capable of directly translating speech into another language without using its textual representation, and preserving voice data and the pace of the speaker’s speech, according to a company blog. A system with a long short-term memory is able to receive voice input and process it as a spectrogram, and then generate on this basis a new spectrogram in the target language. Under certain conditions, this will increase not only the speed of translation, but also its accuracy. A more complete description of the new development can be found in an article published in the online repository of scientific articles arXiv.org.
“Translatotron is the first pass-through modelwhich can directly translate speech from one language to speech in another language, while retaining the original features of the source’s speech, ”the company said in its official blog.
The Google notes that most modernmachine speech translation systems are built on the principle of the cascade method, when a task is divided into several simpler tasks. Within the first one, automatic speech recognition takes place. Then a machine translation from one language to another is carried out, and after that the prepared translated text is turned back into speech, which almost always differs in voice from the original medium.
The cascade system has proven its effectiveness andpracticality, and is used in most translation systems, including Google. However, Google experts in the field of AI believe that this system is not perfect. At each stage of solving the problem errors may occur, which generally reduces the quality of the finished result. Google is confident that the pass-through translation model can exceed the cascade by removing the middle part of the translation, where it is first translated into text.
As explained in Google, the cascade principle of translationIt is not at all like the way people who know several languages mentally translate speech from one language into another. How exactly it works is quite difficult to describe, but translators are unlikely to agree that they first break the text in their head, then visualize it in their minds, translating it into the target language, and then simply count the finished translation.
Spectrograms of the source language and translated speech. The quality of the translation itself, it must be admitted, is not the best, but it sounds more natural.
Imitation of human cognitive abilitiesis one of the principles of machine learning. The Translatotron developers decided to use as input data for the translation of the spectrogram (images showing the dependence of the spectral power density of the signal on time) of the speech of the source and generate on their basis new spectrograms in the target language. This approach is very different from the cascade method of translation. Researchers note that, like any other case, the new system has its own advantages and disadvantages.
One of the advantages of the end-to-end translation methodis that despite its complexity, this process is one-step, not multi-step. Thus, with sufficient computing power, the Translatotron is able to translate faster. But even more importantly, the system preserves the character and characteristics of the original speech in translation, the voice data and the pace of the speaker’s speech, rather than reproducing the translation with a neutral synthetic voice.
Those who understand linguistics, as well as those whoHe is engaged in speech synthesis technologies and will most likely agree that when translating, it is important not only what the person says, but how he says it. A change in the expression of the original speech in a translation speech can radically change the meaning of what has been said. Translatotron examples of work can be found by clicking on this link. Just do not pay attention to the quality of the translation itself, the transfer of intonation is more important.
Translatotron developers admit thatparts of the accuracy of the translation system has not yet outstripped the traditional cascade systems, but, like any machine learning model, it can improve over time. Given the advantage of retaining the original speaker’s voice even in translated speech, further research in this area may be useful for future AI-based Google translation systems.
You can discuss the news in our Telegram-chat.