Researchers from Oxford University have developed a new artificial intelligence for lip reading, which many times exceeds human capabilities.
Reading on the lips can not be called an exact scientific discipline. However, thanks to the capabilities of modern computers, and in particular to neural networks, it is possible to develop artificial intelligence that will cope with reading on the lips much better than a person. Watch, Attend and Spell (WAS) is a new software system with artificial intelligence, developed by scientists from Oxford in collaboration with DeepMind, Google. The WAS system uses computer vision and machine learning techniques to learn how to read on the lips, watching television broadcasts lasting more than 5000 hours.
The research team compared the capabilities of the machine and the human expert to understand what was said in the video, focusing only on the movements of the people’s lips in the frame. As a result, scientists found that the new software was more accurate than the professional. The human correctly recognized only 12 percent of the words, while the artificial intelligence WAS was able to recognize more than 50 percent of the words. The mistakes of the machine consisted only in the loss of the letter “c” at the ends of words.
New software can affect a number of new developments, including helping the hearing impaired in orientation in space. Also in the future, such a system can be used to create subtitles to video in real time. In addition, similar technologies can improve the accuracy and speed of converting speech to text, especially in noisy places where microphones simply do not hear the user.