Research

Google has created a database of thousands of synthesized speech records.

It seems that in the near future, human communicationwill be kept to a minimum. At least everything goes to this: here in Google, for example, they were able to collect a database from thousands of records of synthesized English speech, according to a company blog. Thus, experts are one more step closer to ensuring that the system can reproduce oral speech, which will be indistinguishable from human.

Speech synthesis is needed for the formation of speechsignal from the printed text, and it is not necessary that the person who records his voice for the system speak all the phrases. For the final model, a representative data sample is sufficient — and then the system itself determines how many phonemes it needs for further synthesis.

Why speech recording collected in a single large-scaleDataset? The fact is that if high-quality synthesis turns out to be in the hands of intruders, they can use someone else's voice for their own selfish purposes. To prevent this, Google experts have published a database of several thousand excerpts from newspapers. 68 different synthesized voices were used for reading. However, while the data array is available only for the participants of the Competition Automatic Speaker Verification. They are creating systems that automatically distinguish synthesized speech from real speech.

Two years ago, the company Lyrebird from Montrealcreated an AI-based speech synthesizer capable of reproducing any voice. To follow the system, it takes only a few seconds to audio the voice of the required person, on the basis of which the sound fragment will be created. The exact imitation of the voice is possible through the use of neural networks based on artificial intelligence, working on the same principles as the neural networks of the human brain. AI learns to recognize the characteristics of human speech, and then these data are already used to synthesize an artificial voice.

True, there are eating flaws: there are problems with the intelligibility of the spoken speech, there are "voice artifacts" and some other signs indicating that the words are pronounced by the machine.

Do not forget to subscribe to our news feed.