Virtually all existing virtual assistants, however advanced they may be, have a very strange “machine” voice, which you can not confuse with anything. This is due to the fact that voice synthesizers use for speech a set of pre-recorded phrases, sounds and their combinations. According to some experts in this field, sound quality improvement can be achieved by using artificial intelligence, which was demonstrated by Lyrebird from Montreal. Their development can reproduce the voice of any person.
To imitate the voice of the system, it is only a few seconds before the audio recording of the voice of the required person, on the basis of which the sound fragment will be created. Precise imitation of voice is possible due to the use of neural networks based on artificial intelligence, working on the same principles as neural networks of the human brain. AI learns to recognize the features of human speech, and then these data are already used to synthesize artificial voice. Now the work of the new system is still not without its shortcomings: there are problems with the intelligibility of the speech being spoken, there are “voice artifacts” and some other signs indicating that the words are spoken by the machine. However, all of them can be easily eliminated in the future, because already now the system works in real time. According to one of the authors of the project, Jose Sotelo,
“Our program was trained on a large number of audio fragments of performances by thousands of different people. The information obtained is compressed to the form of a kind of “voice DNA”, which is a digital key. Then, based on this key, the system can reproduce any words, even those that were not involved in the learning process. ”
The authors of the project understand perfectly well that, given the proper level of development of this technology, security problems can not be avoided. For example, to bypass user identification systems by voice. Representatives of the company Lyrebird compare their invention with the invention of a photoshop. After creating a software package from Adobe, it became difficult to trust images on the screen. Now you can not trust your voice either.
“We understand that due to the high level of development of modern technologies such a voice synthesizer would have appeared sooner or later. We encourage everyone to start refusing to accept various audio recordings as evidence, as well as the use of voice-based remedies. ”
In any case, for the time being, it’s too early to worry, because the system is very damp, and in “synthetic” voices “robotic notes” still sound