{{#customtitle:Spatial Language at MIT|Spatial Language at MIT}}
Natural language is an intuitive and flexible modality for human-robot interaction. A robot designed to interact naturally with humans must be able to understand instructions without requiring the person to speak in any special way. Understanding language from an untrained user is challenging because we are not asking the human to adapt to the limitations of the system, i.e., to limit their instructions to a small vocabulary or grammar. Rather, we want a system that understands naturalistic language directly as produced by people. For example, we want to build systems that engage in dialog:
Person: Wait by the elevator for Steven and bring him to my office.
Robot: How do I get to the elevators?
Person: You get there by walking down and turning left, going through the grey security door. Continue straight and the elevators should be on your right.
Robot: Will do.
This problem involves the conversion of natural language into a semantic representation, the grounding of symbols in the environment, and performing dialogue. In the current work, we have focused mostly on the first two aspects.
We recently got the system working on a quadrotor helicopter. We extended the topological map to 3d, added some orientation commands like "Face the windows and shot some videos of it flying around the Stata Center!
The system localizes itself in the map using a laser range finder. For this demo, we've annotated the map with the locations of objects like "the windows" and "room 124," but we have a prototype version that uses the camera to detect objects automatically. It uses co-occurrence statistics from Flickr to infer the locations of objects that it can't detect directly. There's an onboard PC running Ubuntu doing control and running the sensors, and an offboard laptop that's running almost everything else. We have also demonstrated our approach on a robotic wheelchair. Below is an example query along with the path that the robot inferred to get to its destination. The full version can be seen here
We have also demonstrated an approach similar to the local inference algorithm [Kollar 10], but that allows backtracking. A video of this is below (original here
Along with the system we have developed we have collected an extensive corpus of natural language directions along with maps and data from the environment. In experiments, subjects were asked to give directions from one label to another in this map as if they were giving directions to a friend. We used two different environments: an office environment from two adjoining office buildings, and the lobby/atrium in the first floor of the Stata center. These environments were large and complex. Maps of the environments appear below.
| Eight floor (office environment) | First Floor (Lobby/Atrium) |
| | |
Here are sample directions from the corpus.