1 00:00:00,000 --> 00:00:04,030 2 00:00:04,030 --> 00:00:07,780 Introductions-- I'm Heather, obviously. 3 00:00:07,780 --> 00:00:09,610 And I'm Candace Sidner. 4 00:00:09,610 --> 00:00:13,180 And you're working right now at-- 5 00:00:13,180 --> 00:00:16,720 At Mitsubishi Labs in Cambridge, across from Tech Square. 6 00:00:16,720 --> 00:00:17,220 Awesome. 7 00:00:17,220 --> 00:00:18,820 Across from Tech Square, great. 8 00:00:18,820 --> 00:00:21,800 And how did you first get into artificial intelligence 9 00:00:21,800 --> 00:00:23,540 or in robotics? 10 00:00:23,540 --> 00:00:27,220 Well, I started off in-- 11 00:00:27,220 --> 00:00:30,370 as an undergraduate, I was interested in how people 12 00:00:30,370 --> 00:00:32,380 communicate and use language. 13 00:00:32,380 --> 00:00:36,490 And I did my master's degree at the University of Pittsburgh, 14 00:00:36,490 --> 00:00:39,710 and while I was there I took courses with Herb Simon. 15 00:00:39,710 --> 00:00:40,210 Cool. 16 00:00:40,210 --> 00:00:42,560 And when I started doing the master's thesis, 17 00:00:42,560 --> 00:00:45,070 Herb said, well, go see this former student of mine whose 18 00:00:45,070 --> 00:00:47,200 name is [INAUDIBLE] who's somewhere 19 00:00:47,200 --> 00:00:49,300 at the University of Pittsburgh, and talk to him. 20 00:00:49,300 --> 00:00:52,090 And Harry Pople was at the [INAUDIBLE] business school. 21 00:00:52,090 --> 00:00:56,750 Harry was doing what was called, the INTERNIST medical system, 22 00:00:56,750 --> 00:00:59,970 which was a very complicated and sophisticated AI 23 00:00:59,970 --> 00:01:03,640 program to mimic the common decision-making that doctors 24 00:01:03,640 --> 00:01:07,920 did when they were diagnosing patients. 25 00:01:07,920 --> 00:01:09,800 And so I did my master's thesis with him. 26 00:01:09,800 --> 00:01:12,620 And then after my thesis was finished, 27 00:01:12,620 --> 00:01:15,970 I was admitted to the doctoral program at MIT at the AI lab. 28 00:01:15,970 --> 00:01:18,880 29 00:01:18,880 --> 00:01:23,273 And when did you get into emotional-- 30 00:01:23,273 --> 00:01:24,606 what are you mainly researching? 31 00:01:24,606 --> 00:01:25,840 So I don't missay. 32 00:01:25,840 --> 00:01:27,430 33 00:01:27,430 --> 00:01:31,360 So my work has largely been in the area of human language 34 00:01:31,360 --> 00:01:33,820 and human conversation. 35 00:01:33,820 --> 00:01:36,030 So over the years, I've looked a lot 36 00:01:36,030 --> 00:01:40,140 at how we model the intentions of people 37 00:01:40,140 --> 00:01:42,060 when they're communicating with one another, 38 00:01:42,060 --> 00:01:44,940 as well as what they do with their language barrier. 39 00:01:44,940 --> 00:01:48,370 I've done a lot of work on how we understand pronouns, 40 00:01:48,370 --> 00:01:51,340 how we understand all kinds of referring expressions, 41 00:01:51,340 --> 00:01:56,950 and how we embed our intentions in our utterances, 42 00:01:56,950 --> 00:02:00,590 and then convey that over successive parts 43 00:02:00,590 --> 00:02:03,140 of the conversation. 44 00:02:03,140 --> 00:02:05,030 A few years ago, I got interested 45 00:02:05,030 --> 00:02:08,120 in the problem of how we use gestures and language. 46 00:02:08,120 --> 00:02:11,090 So obviously, one of them is what I do with my hands. 47 00:02:11,090 --> 00:02:12,523 But equally important is the fact 48 00:02:12,523 --> 00:02:13,940 that I'm looking at you right now, 49 00:02:13,940 --> 00:02:16,730 and I'm looking at the camera, for example. 50 00:02:16,730 --> 00:02:19,190 And that when I'm talking to you, I actually talk to you. 51 00:02:19,190 --> 00:02:21,943 I don't talk to the wall like this, because if I did, 52 00:02:21,943 --> 00:02:23,360 there's something very importantly 53 00:02:23,360 --> 00:02:25,010 wrong with that interaction. 54 00:02:25,010 --> 00:02:26,960 I call that the process of engagement. 55 00:02:26,960 --> 00:02:30,860 That is how we come to perceive connection 56 00:02:30,860 --> 00:02:34,490 through nonverbal means and interactions. 57 00:02:34,490 --> 00:02:35,660 There's also a verbal means. 58 00:02:35,660 --> 00:02:39,060 We do it, of course, as long as we're having the conversation. 59 00:02:39,060 --> 00:02:41,000 Even if I'm doing something else, 60 00:02:41,000 --> 00:02:43,340 you know that I intend to be connected 61 00:02:43,340 --> 00:02:45,200 to you because I continue to play 62 00:02:45,200 --> 00:02:46,600 my part in the conversation. 63 00:02:46,600 --> 00:02:53,320 But if I violate certain aspects of our non-verbal connection, 64 00:02:53,320 --> 00:02:54,490 that can be very odd. 65 00:02:54,490 --> 00:02:55,870 Some of it's OK, however. 66 00:02:55,870 --> 00:02:59,308 So for example, if I'm washing the dishes, 67 00:02:59,308 --> 00:03:00,850 I have to pay attention to the dishes 68 00:03:00,850 --> 00:03:02,200 while I'm talking to you. 69 00:03:02,200 --> 00:03:05,730 I will pay attention to this, and I'm only occasionally 70 00:03:05,730 --> 00:03:08,220 looking up as I have something to say. 71 00:03:08,220 --> 00:03:10,920 And if you're talking, instead I'll be doing "mhmm", 72 00:03:10,920 --> 00:03:12,780 and I'm doing when I was doing the "mm". 73 00:03:12,780 --> 00:03:14,830 I may not look at you at that point in time. 74 00:03:14,830 --> 00:03:19,350 I may be able to do this, but you'll understand that I'm now 75 00:03:19,350 --> 00:03:21,260 multitasking as we are and-- 76 00:03:21,260 --> 00:03:23,692 So can you learn those things? 77 00:03:23,692 --> 00:03:25,650 At this stage do they have to be preprogrammed? 78 00:03:25,650 --> 00:03:26,020 No. 79 00:03:26,020 --> 00:03:28,400 Those are the sorts of things we still have to program. 80 00:03:28,400 --> 00:03:32,270 Just getting clear about what the [INAUDIBLE].. 81 00:03:32,270 --> 00:03:35,330 And the way I've been testing a lot of those ideas 82 00:03:35,330 --> 00:03:39,110 is to have a robot that can converse with the user 83 00:03:39,110 --> 00:03:43,420 and can both produce proper behavior 84 00:03:43,420 --> 00:03:46,020 and then interpret the behavior of the person. 85 00:03:46,020 --> 00:03:50,970 So that means the robot is not so important that it's mobile. 86 00:03:50,970 --> 00:03:53,840 Up until now, we haven't really done work on mobile robotics. 87 00:03:53,840 --> 00:03:58,512 But more importantly, 88 00:03:58,512 --> 00:04:00,220 it has to move in the right sorts of way, 89 00:04:00,220 --> 00:04:01,930 and it also has to have the ability 90 00:04:01,930 --> 00:04:05,250 to interpret people's movements, as much 91 00:04:05,250 --> 00:04:07,330 as you can do that with current vision systems. 92 00:04:07,330 --> 00:04:12,295 So is your main goal more to interpret human expression 93 00:04:12,295 --> 00:04:16,290 or to have humanoid robots that can replicate them, or both? 94 00:04:16,290 --> 00:04:18,089 Well, you have to do both. 95 00:04:18,089 --> 00:04:20,268 Conversation is a two way thing. 96 00:04:20,268 --> 00:04:22,560 You both have to produce it properly, and you have to-- 97 00:04:22,560 --> 00:04:24,700 I was just thinking of an intelligent computer that 98 00:04:24,700 --> 00:04:26,740 was just trying to interpret your work 99 00:04:26,740 --> 00:04:29,250 or something like that, where you're getting frustrated. 100 00:04:29,250 --> 00:04:30,125 That might be useful. 101 00:04:30,125 --> 00:04:33,280 102 00:04:33,280 --> 00:04:35,640 I'm interested both in the interpretation phase and also 103 00:04:35,640 --> 00:04:41,060 the ability to generate or produce that kind of behavior. 104 00:04:41,060 --> 00:04:43,760 What do you think the biggest challenges in AI 105 00:04:43,760 --> 00:04:47,930 historically have been, and how has that changed, now? 106 00:04:47,930 --> 00:04:52,268 Well, one of the challenges that really hasn't changed 107 00:04:52,268 --> 00:04:54,060 is something that I talked about yesterday, 108 00:04:54,060 --> 00:04:58,180 which is the need for a deep model of semantics of language. 109 00:04:58,180 --> 00:05:04,177 So we now are able to say things about how a verb might have 110 00:05:04,177 --> 00:05:06,010 certain arguments, and you can fill them in, 111 00:05:06,010 --> 00:05:08,710 and you can do some of that adequate language 112 00:05:08,710 --> 00:05:09,730 understanding. 113 00:05:09,730 --> 00:05:13,750 But the much deeper problems of how we know certain things 114 00:05:13,750 --> 00:05:18,460 about how sentences go together in terms of their meaning, 115 00:05:18,460 --> 00:05:22,500 how we make deeper inferences about what's 116 00:05:22,500 --> 00:05:26,240 going on between utterances or even 117 00:05:26,240 --> 00:05:30,560 between practical sentences and how that's connected onto what 118 00:05:30,560 --> 00:05:36,590 we know in general is still relatively poorly understood. 119 00:05:36,590 --> 00:05:39,970 And so I think that's a pretty important problem. 120 00:05:39,970 --> 00:05:42,480 Obviously, the work I've started doing on gesture 121 00:05:42,480 --> 00:05:45,600 is only in its beginning phases up until now. 122 00:05:45,600 --> 00:05:47,970 Until very recently-- and I'm about the only person even 123 00:05:47,970 --> 00:05:50,100 working with Siri-- but up until very recently, 124 00:05:50,100 --> 00:05:53,690 like maybe five years ago, nobody thought to. 125 00:05:53,690 --> 00:05:55,440 And it's not that nobody thought about it. 126 00:05:55,440 --> 00:05:58,070 Nobody really had a good way to go about it, 127 00:05:58,070 --> 00:05:59,420 and that's gradually changed. 128 00:05:59,420 --> 00:06:01,480 What do you think of Cynthia Breazeal's work? 129 00:06:01,480 --> 00:06:01,980 Oh! 130 00:06:01,980 --> 00:06:04,920 I like Cynthia's work, obviously. 131 00:06:04,920 --> 00:06:08,665 In fact, some of her students have been my summer interns. 132 00:06:08,665 --> 00:06:11,817 So we've happily shared back and forth 133 00:06:11,817 --> 00:06:13,900 and I hope learned from each other in the process. 134 00:06:13,900 --> 00:06:17,300 135 00:06:17,300 --> 00:06:21,570 So another area, of course, that I think is largely unexplored 136 00:06:21,570 --> 00:06:23,730 is things that have to do with social and emotional 137 00:06:23,730 --> 00:06:25,740 interactions. 138 00:06:25,740 --> 00:06:27,900 A number of years ago, there were a few people 139 00:06:27,900 --> 00:06:33,445 in the language community who explored emotional stuff, texts 140 00:06:33,445 --> 00:06:34,320 and things like that. 141 00:06:34,320 --> 00:06:39,090 But it kind of died off, and looking at this stuff 142 00:06:39,090 --> 00:06:42,600 any further or any deeper has really not happened. 143 00:06:42,600 --> 00:06:44,940 So the reason I think it's important 144 00:06:44,940 --> 00:06:47,910 is that when we communicate together 145 00:06:47,910 --> 00:06:52,350 in a human to human situation, our emotional content 146 00:06:52,350 --> 00:06:54,300 is expressed both in our language 147 00:06:54,300 --> 00:06:57,150 and in our gestural behavior-- mostly in our faces, I think. 148 00:06:57,150 --> 00:06:59,085 Although if we're really upset, obviously it 149 00:06:59,085 --> 00:07:00,270 comes up in that way. 150 00:07:00,270 --> 00:07:03,900 And it's useful to be able to take 151 00:07:03,900 --> 00:07:07,080 advantage of that information on the part of the machine. 152 00:07:07,080 --> 00:07:09,540 It may also turn out to be useful to have the machine be 153 00:07:09,540 --> 00:07:10,830 able to express emotions. 154 00:07:10,830 --> 00:07:12,405 You can affect the [INAUDIBLE]. 155 00:07:12,405 --> 00:07:14,880 156 00:07:14,880 --> 00:07:16,480 More subtle but equally important 157 00:07:16,480 --> 00:07:18,040 is our social interactions. 158 00:07:18,040 --> 00:07:24,270 And recent researchers have shown that when we have social 159 00:07:24,270 --> 00:07:29,100 conversations-- not just about putting things together-- 160 00:07:29,100 --> 00:07:34,650 with machines, people have a markedly different sense 161 00:07:34,650 --> 00:07:38,178 of what that entity is. 162 00:07:38,178 --> 00:07:42,010 You can measure that on standard psychological measures, 163 00:07:42,010 --> 00:07:43,640 which is pretty interesting. 164 00:07:43,640 --> 00:07:45,640 So it's interesting, and I always 165 00:07:45,640 --> 00:07:47,590 write it as a kind of practical thing 166 00:07:47,590 --> 00:07:49,020 you want to do with a computer. 167 00:07:49,020 --> 00:07:51,270 But from the point of view of artificial intelligence, 168 00:07:51,270 --> 00:07:55,508 it's really kind of interesting and important [INAUDIBLE].. 169 00:07:55,508 --> 00:07:57,300 What do you think the most important skills 170 00:07:57,300 --> 00:08:01,230 for an artificial intelligence researcher to have are today? 171 00:08:01,230 --> 00:08:02,010 Researchers? 172 00:08:02,010 --> 00:08:04,790 Wow. 173 00:08:04,790 --> 00:08:06,690 Really good observational skills. 174 00:08:06,690 --> 00:08:08,570 So I think it's really important to look 175 00:08:08,570 --> 00:08:11,180 at what the human-human situation is 176 00:08:11,180 --> 00:08:15,030 and to be able to observe that carefully. 177 00:08:15,030 --> 00:08:15,630 Very good 178 00:08:15,630 --> 00:08:18,210 179 00:08:18,210 --> 00:08:20,840 analytic skills in terms of being 180 00:08:20,840 --> 00:08:22,880 able to quantify and think about things 181 00:08:22,880 --> 00:08:26,570 and really analyze what's going on. 182 00:08:26,570 --> 00:08:31,220 The ability to listen carefully and read other people's work. 183 00:08:31,220 --> 00:08:33,559 And then, to be creative and kind of 184 00:08:33,559 --> 00:08:36,718 be able to step off the edge. 185 00:08:36,718 --> 00:08:38,260 You can't make progress if you're not 186 00:08:38,260 --> 00:08:41,650 willing to take risks. 187 00:08:41,650 --> 00:08:43,520 Well, we're sort of short on time. 188 00:08:43,520 --> 00:08:47,140 But do you have any anecdotes or stories 189 00:08:47,140 --> 00:08:49,360 that you remember from your times at MIT 190 00:08:49,360 --> 00:08:52,000 or pathways along the way? 191 00:08:52,000 --> 00:08:52,750 Oh, wow. 192 00:08:52,750 --> 00:08:55,360 Well, a lot of my anecdotes have to do 193 00:08:55,360 --> 00:08:58,570 with being a woman in a highly male profession. 194 00:08:58,570 --> 00:09:02,440 And it's not always been easy, then, to say the least. 195 00:09:02,440 --> 00:09:04,576 So I don't know if I want to convey 196 00:09:04,576 --> 00:09:06,820 all them at this particular point in time, 197 00:09:06,820 --> 00:09:08,910 but there's a certain challenge to being 198 00:09:08,910 --> 00:09:11,490 a woman in an all male field. 199 00:09:11,490 --> 00:09:13,979 It has certainly colored my professional life 200 00:09:13,979 --> 00:09:17,850 and my experiences as a scientist. 201 00:09:17,850 --> 00:09:18,350 Cool. 202 00:09:18,350 --> 00:09:20,090 Well, a pleasure talking to you. 203 00:09:20,090 --> 00:09:23,690 204 00:09:23,690 --> 00:09:25,540 Let's go.