Robots can learn to perform any task by watching videos or observing people. This is a revolutionary advance in robotics that could greatly improve the functionality of robots in the home. The new capabilities allow them to assist with tasks such as cooking and cleaning.
How the team taught robots to learn by watching videos
Researchers successfully trained two robots to perform 12 different tasks. These include opening a drawer, an oven door or lid, removing a pot from the stove, and picking up items such as a phone or a can of soup.
Deepak Pathak, assistant professor at CMU’s Robotics Institute , explained, “The robot can learn where and how people interact with different objects by watching videos.” He added that the knowledge gained from these videos played an important role in teaching a model that allows robots to perform similar tasks in different environments.
Training robots typically involves people manually demonstrating tasks or performing extensive training in a simulated environment. These are not only time-consuming methods, but also prone to failure.
WHIRL method vs. VRB method
Previously, Pathak and his students proposed a new method by which robots could learn by watching humans perform tasks. This method, called Human Learning Imitating Robot in the Wilderness (WHIRL), required humans to perform a task in the same environment as the robot.
Patak’s latest research, called the Vision-Robotics Bridge (VRB), expands and refines the WHIRL concept. This new model bypasses the need for human demonstrations. It also eliminates the need for the robot to work in an identical environment.
However, as with WHIRL, the robot still needs practice to perfect the task. The researchers found that the robot can master a new task in as little as 25 minutes.
Shikhar Bahl, Ph. D. robotics student, said: “We were able to move the robot around campus and do all sorts of tasks.” He added: “Robots could use this model to curiously explore the world around them. Instead of just waving its arms around, the robot can be more direct in how it interacts.”
The secret to teaching robots to learn through observation
The key to teaching robots was applying the concept of affordance. It is an idea rooted in psychology that refers to what the environment offers to humans.
In the case of VRB, affords were used to determine where and how a robot could interact with an object based on observed human behavior.
For example, if the robot is watching a human open a drawer, it determines the points of contact – the handle – and the direction of the drawer – right from the initial position. After watching several videos of people opening drawers, the robot can decipher how to open any drawer.
The team used large video datasets such as Ego4D and Epic Kitchens to train the robots. The Ego4D dataset includes nearly 4,000 hours of first-person videos of daily activities from around the world. Some of these were collected by CMU researchers.
Similarly, Epic Kitchens offers videos showing cooking, cleaning, and other kitchen tasks. These data sets are commonly used to train computer vision models.
Robots trained using the VRB method can be a huge help in household tasks. For example, they can cook a meal or clean the house. This can make life a lot easier for people, especially those with mobility or time constraints.
Research in robotics is ongoing, and we can expect even more development in this area in the near future.