WHIRL, which stands for In-the-Wild Human Imitating Robot Learning, is a new learning method for robots developed by researchers. WHIRL is a fast one-shot visual imitation algorithm. Robots are well-suited to learning household chores because they can learn directly from human-interaction videos and generalize that information to new tasks. In their homes, people are constantly performing a variety of tasks. A robot can use WHIRL to observe those tasks and collect the video data it requires to eventually determine how to complete the job itself.
Shikhar Bahl opened the refrigerator door, and the robot watched. It recorded his movements, the swing of the door, the location of the fridge, and other information, analyzing it and preparing to mimic what Bahl had done. It failed at first, missing the handle completely at times, grabbing it in the wrong spot or pulling it incorrectly. But after a few hours of practice, the robot succeeded and opened the door.
“Imitation is a great way to learn,” said Bahl, a Ph.D. student at the Robotics Institute (RI) in Carnegie Mellon University’s School of Computer Science. “Having robots actually learn from directly watching humans remains an unsolved problem in the field, but this work takes a significant step in enabling that ability.”
Bahl collaborated with Deepak Pathak and Abhinav Gupta, both RI faculty members, to create WHIRL, an acronym for In-the-Wild Human Imitating Robot Learning. WHIRL is a fast one-shot visual imitation algorithm. Robots are well-suited to learning household chores because they can learn directly from human-interaction videos and generalize that information to new tasks. In their homes, people are constantly performing a variety of tasks. A robot can use WHIRL to observe those tasks and collect the video data it requires to eventually determine how to complete the job itself.
Imitation is a great way to learn. Having robots actually learn from directly watching humans remains an unsolved problem in the field, but this work takes a significant step in enabling that ability.
Shikhar Bahl
The team attached a camera and their software to an off-the-shelf robot, and it learned to do more than 20 tasks, including opening and closing appliances, cabinet doors, and drawers, putting a lid on a pot, pushing in a chair, and even taking a garbage bag out of the bin. Each time, the robot observed a human perform the task once and then went about practicing and learning to perform the task on its own. The team presented their findings this month at the Robotics: Science and Systems conference in New York.
“This work presents a way to bring robots into the home,” said Pathak, an assistant professor at RI and team member. “Instead of waiting for robots to be programmed or trained to successfully complete different tasks before deploying them into people’s homes, this technology allows us to deploy the robots and have them learn how to complete tasks, all the while adapting to their environments and improving solely by watching.”
Current methods for teaching a robot a task typically rely on imitation or reinforcement learning. In imitation learning, humans manually operate a robot to teach it how to complete a task. This process must be done several times for a single task before the robot learns. In reinforcement learning, the robot is typically trained on millions of examples in simulation and then asked to adapt that training to the real world.
Both learning models are effective for teaching a single task to a robot in a structured environment, but they are difficult to scale and deploy. WHIRL is capable of learning from any video of a human performing a task. It is easily scalable, is not limited to a single task, and can function in realistic home environments. The team is even working on a version of WHIRL that is trained by watching human interaction videos on YouTube and Flickr.
The work was made possible by advances in computer vision. Computers can now understand and model movement in 3D using models trained on internet data. The team used these models to better understand human movement, which aided in WHIRL training.
A robot can complete tasks in its natural environment using WHIRL. The appliances, doors, drawers, lids, chairs, and garbage bag were not altered or manipulated in any way to accommodate the robot. The robot’s first several attempts at a task failed, but after a few successes, it quickly figured out how to do it and mastered it. While the robot may not perform the task in the same way as a human, that is not the goal. Humans and robots have different body parts and move in different ways. What matters is that the outcome is the same. The door has been opened. The switch has been deactivated. The faucet has been turned on.
“To scale robotics in the wild, the data must be reliable and stable, and the robots should become better in their environment by practicing on their own,” Pathak said.