Main Page

From Spatial Language
Jump to: navigation, search

{{#customtitle:Spatial Language at MIT|Spatial Language at MIT}}

Natural language is an intuitive and flexible modality for human-robot interaction. A robot designed to interact naturally with humans must be able to understand instructions without requiring the person to speak in any special way. Understanding language from an untrained user is challenging because we are not asking the human to adapt to the limitations of the system, i.e., to limit their instructions to a small vocabulary or grammar. Rather, we want a system that understands naturalistic language directly as produced by people. For example, we want to build systems that engage in dialog:

Person: Wait by the elevator for Steven and bring him to my office.

Robot: How do I get to the elevators?

Person: You get there by walking down and turning left, going through the grey security door. Continue straight and the elevators should be on your right.

Robot: Will do.

This problem involves the conversion of natural language into a semantic representation, the grounding of symbols in the environment, and performing dialogue. In the current work, we have focused mostly on the first two aspects.

We recently got the system working on a quadrotor helicopter. We extended the topological map to 3d, added some orientation commands like "Face the windows and shot some videos of it flying around the Stata Center!

The system localizes itself in the map using a laser range finder. For this demo, we've annotated the map with the locations of objects like "the windows" and "room 124," but we have a prototype version that uses the camera to detect objects automatically. It uses co-occurrence statistics from Flickr to infer the locations of objects that it can't detect directly. There's an onboard PC running Ubuntu doing control and running the sensors, and an offboard laptop that's running almost everything else. We have also demonstrated our approach on a robotic wheelchair. Below is an example query along with the path that the robot inferred to get to its destination. The full version can be seen here

We have also demonstrated an approach similar to the local inference algorithm [Kollar 10], but that allows backtracking. A video of this is below (original here


Along with the system we have developed we have collected an extensive corpus of natural language directions along with maps and data from the environment. In experiments, subjects were asked to give directions from one label to another in this map as if they were giving directions to a friend. We used two different environments: an office environment from two adjoining office buildings, and the lobby/atrium in the first floor of the Stata center. These environments were large and complex. Maps of the environments appear below.

Eight floor
(office environment)
First Floor

Here are sample directions from the corpus.

Eight Floor (office)

  • R17 to R5 - With your back to the windows, walk straight through the door near the elevators. Continue to walk straight, going through one door until you come to an intersection just past a whiteboard. Turn left, turn right, and enter the second door on your right (sign says "Administrative Assistant").
  • R19 to R22 - From the lounge with the computers on your left, walk through either short hall and take a right down the hall past the mailboxes on your right. Walk all the way through the atrium and down the hall past the spiral staircase. Turn left at the second door past the glass doors to the elevators. Enter into the Department of Linguistics and Philosophy.
  • R25 to R26 - Walk straight down the hall and turn left at the Department of Linguistics and Philosophy headquarters. Continue walking past the stair case, but look right. After the glass windows (looking at elevators) turn right around the corner into a small corner kitchen.
  • R1 to R19 - You are currently in a large room with windows. When you leave the room, turn left (first left) and go to the end of the hall. Turn right and go to the end of the hall. Turn right and go to the end of the hall (where the gray double doors labeled 36-884Z and 36-884E). Turn left and go through the gray door. Keep going straight until you reach the closed double doors. Go through the double doors and turn right. Walk past a row of colored chairs until you reach a set of wooden cabinets with M&M toys and a boston T map taped on the cabinet. Turn left at the cabinet and walk until you reach the wall of mailboxes (cubbies). At the mailboxes turn right, then take your first left. After you turn left go straight, this room with the orange sofa and chairs is your destination.

First Floor (lobby/atrium)

  • R1 to R8 - With your back towards the glass doors facing the street, walk forward towards the white column labeled, "Fourth Floor Common". When you reach the column, turn left and walk towards two large, angled, grey pillars. After passing the first pillar, turn right and walk forward past the room labeled "123", towards a set of glass doors. Stop when you are in a room with windows looking into a gym.
  • R19 to R20 - Walk straight and take a left towards the MIT Libraries. The Forbes family cafe should be on your left. Take another left towards the water fountains. On your left you should see a bunch of tables and chairs section. Walk to it.
  • R16 to R9 - Go right at the computer terminals. Up a little on the left will be an entrance to some stairway. Use the door just past it.</span>
Personal tools