1 00:00:00,000 --> 00:00:08,690 2 00:00:08,690 --> 00:00:10,280 In order to act intelligently, there's 3 00:00:10,280 --> 00:00:12,322 a lot of things you have to know about the world. 4 00:00:12,322 --> 00:00:15,110 And one approach is to try and tell an artificial intelligence 5 00:00:15,110 --> 00:00:17,840 program everything, write it out in great detail 6 00:00:17,840 --> 00:00:19,660 and tell it all the facts. 7 00:00:19,660 --> 00:00:21,050 By building a robot, we're trying 8 00:00:21,050 --> 00:00:23,540 to build a system which can act in the world, 9 00:00:23,540 --> 00:00:25,400 interact with people, and learn for itself, 10 00:00:25,400 --> 00:00:28,520 and our hope is that will lead to a quicker 11 00:00:28,520 --> 00:00:30,650 accumulation of the sort of knowledge 12 00:00:30,650 --> 00:00:33,020 of what it is to act in the world, 13 00:00:33,020 --> 00:00:36,200 so that we can have true artificial intelligence. 14 00:00:36,200 --> 00:00:39,560 To encourage people to interact with the robot naturally, 15 00:00:39,560 --> 00:00:42,290 we've built the robot to look like a human 16 00:00:42,290 --> 00:00:43,880 and to act like a human. 17 00:00:43,880 --> 00:00:48,890 Cog has two eyes, microphones for ears, 18 00:00:48,890 --> 00:00:52,700 and a set of gyroscopes that give it a sense of balance. 19 00:00:52,700 --> 00:00:54,860 Each of Cog's eyes has two cameras. 20 00:00:54,860 --> 00:00:58,220 One that has a very wide angle, peripheral field of view 21 00:00:58,220 --> 00:01:01,160 and one that has a very narrow field of view, but much higher 22 00:01:01,160 --> 00:01:03,140 resolution. 23 00:01:03,140 --> 00:01:05,480 Cog has a total of 21 degrees of freedom, 24 00:01:05,480 --> 00:01:07,790 including two six degree of freedom arms, 25 00:01:07,790 --> 00:01:10,490 three degrees of freedom in the torso, three in the neck, 26 00:01:10,490 --> 00:01:12,350 and three in the eyes. 27 00:01:12,350 --> 00:01:15,380 So my area of specialty is the arms of Cog 28 00:01:15,380 --> 00:01:17,540 and in keeping with the rest of the project, 29 00:01:17,540 --> 00:01:20,240 instead of trying to program the arms explicitly 30 00:01:20,240 --> 00:01:22,848 so that I explicitly tell the arms what they should do, 31 00:01:22,848 --> 00:01:24,890 I've been trying to program the arms so that they 32 00:01:24,890 --> 00:01:26,780 respond to their environment and interact 33 00:01:26,780 --> 00:01:28,200 with their environment. 34 00:01:28,200 --> 00:01:31,640 I have the oscillators at these two joints here, 35 00:01:31,640 --> 00:01:33,920 and they're getting feedback about how 36 00:01:33,920 --> 00:01:36,470 the weight of the slinky goes from arm to arm, 37 00:01:36,470 --> 00:01:39,830 and they're using that to coordinate the two joints. 38 00:01:39,830 --> 00:01:41,810 And because the system is reactive 39 00:01:41,810 --> 00:01:46,340 and I can stop this thing, when I let 40 00:01:46,340 --> 00:01:48,950 go it will straightaway start again 41 00:01:48,950 --> 00:01:50,840 into the right slinky action. 42 00:01:50,840 --> 00:01:54,110 Using the same program and relying on this feedback 43 00:01:54,110 --> 00:01:56,240 coupling between the program and its environment, 44 00:01:56,240 --> 00:01:59,640 it means I can perform a lot of very different tasks. 45 00:01:59,640 --> 00:02:01,280 So this pendulum swinging is one, 46 00:02:01,280 --> 00:02:04,370 and the robot can also turn cranks. 47 00:02:04,370 --> 00:02:07,970 So there, the robot does turn the crank a little bit. 48 00:02:07,970 --> 00:02:10,460 The movement isn't smooth and it's somewhat jerky. 49 00:02:10,460 --> 00:02:13,430 But then if I switch on the system 50 00:02:13,430 --> 00:02:20,150 so that it is then feeling, at each joint if it senses 51 00:02:20,150 --> 00:02:21,860 what the motion imposed by the crank 52 00:02:21,860 --> 00:02:24,720 is and then tunes into that. 53 00:02:24,720 --> 00:02:27,920 So here's a demonstration of the [INAUDIBLE] to motion routine. 54 00:02:27,920 --> 00:02:30,350 And what you see is, Brian moves his hand around 55 00:02:30,350 --> 00:02:33,230 and Cog moves his eyes to look directly at his moving hand. 56 00:02:33,230 --> 00:02:35,960 The way that works is we have a routine shown here, 57 00:02:35,960 --> 00:02:38,930 and this motion information is sent to a second routine, which 58 00:02:38,930 --> 00:02:42,260 is learned previously, how to move the eyes to focus on 59 00:02:42,260 --> 00:02:43,875 to any target. 60 00:02:43,875 --> 00:02:45,500 When it gets that data, then it ends up 61 00:02:45,500 --> 00:02:49,692 moving the eyes to focus on Brian's moving hand. 62 00:02:49,692 --> 00:02:51,650 We've built a number of additional visual motor 63 00:02:51,650 --> 00:02:54,770 routines, such as orienting to a salient stimulus, 64 00:02:54,770 --> 00:02:58,010 tracking a moving object, and stabilizing the visual field 65 00:02:58,010 --> 00:03:00,480 using a vestibular ocular reflex. 66 00:03:00,480 --> 00:03:02,450 So without the vestibular ocular reflex, 67 00:03:02,450 --> 00:03:04,340 as I move the head around, the eyes 68 00:03:04,340 --> 00:03:05,805 follow the position of the head. 69 00:03:05,805 --> 00:03:07,430 They don't actually follow the position 70 00:03:07,430 --> 00:03:09,800 of the target out in the world. 71 00:03:09,800 --> 00:03:11,690 With the vestibular ocular reflex, 72 00:03:11,690 --> 00:03:14,390 the feedback from the gyroscopes controls the position 73 00:03:14,390 --> 00:03:15,420 of the eyes. 74 00:03:15,420 --> 00:03:17,300 So as I move it back and forth now, 75 00:03:17,300 --> 00:03:21,020 the eyes stay locked towards me, they keep tracking the target 76 00:03:21,020 --> 00:03:22,760 regardless of how I move the head, 77 00:03:22,760 --> 00:03:26,210 even when I move it quite violently. 78 00:03:26,210 --> 00:03:29,600 People recognize a number of different social cues 79 00:03:29,600 --> 00:03:31,850 that are very important for the way they interact. 80 00:03:31,850 --> 00:03:34,280 Things like eye contact are very important 81 00:03:34,280 --> 00:03:37,190 to normal human social interactions. 82 00:03:37,190 --> 00:03:40,040 We'd like to have our robot be able to recognize 83 00:03:40,040 --> 00:03:43,190 these same sorts of social cues and be able to respond 84 00:03:43,190 --> 00:03:45,800 to them appropriately. 85 00:03:45,800 --> 00:03:47,960 Using models from developmental psychology 86 00:03:47,960 --> 00:03:49,940 and from studies of autism, we've 87 00:03:49,940 --> 00:03:53,040 been building systems that can utilize joint attention. 88 00:03:53,040 --> 00:03:55,370 The first step is finding faces. 89 00:03:55,370 --> 00:03:58,160 Once a face has been located, the robot [INAUDIBLE] to the 90 00:03:58,160 --> 00:04:01,310 face in order to get a high resolution image of the eye. 91 00:04:01,310 --> 00:04:03,320 One of the mechanisms for learning 92 00:04:03,320 --> 00:04:05,750 in a social environment is imitation. 93 00:04:05,750 --> 00:04:07,280 So you'd like the robot, perhaps, 94 00:04:07,280 --> 00:04:09,650 to imitate you in what you're doing, 95 00:04:09,650 --> 00:04:13,310 and this is something that you see very early in children. 96 00:04:13,310 --> 00:04:15,230 By tracking the motion of the face, 97 00:04:15,230 --> 00:04:17,600 the robot can imitate head nods. 98 00:04:17,600 --> 00:04:20,029 The robot is only sensitive to faces, 99 00:04:20,029 --> 00:04:22,680 not to any moving object. 100 00:04:22,680 --> 00:04:25,070 The system also detects toys with faces 101 00:04:25,070 --> 00:04:26,810 and imitates them in the same way. 102 00:04:26,810 --> 00:04:47,330 103 00:04:47,330 --> 00:04:49,510 I'm interested in learning in the social-- 104 00:04:49,510 --> 00:04:52,220 in a social context where it's the interaction 105 00:04:52,220 --> 00:04:54,890 I try to exploit is something like an infant caretaker 106 00:04:54,890 --> 00:04:57,680 interaction, where I'm the caretaker, essentially 107 00:04:57,680 --> 00:04:59,840 and the robot is like an infant. 108 00:04:59,840 --> 00:05:02,150 And I want to exploit these kinds of interactions 109 00:05:02,150 --> 00:05:04,340 that parents have always been giving their children, 110 00:05:04,340 --> 00:05:06,720 in terms of constraining the environment, 111 00:05:06,720 --> 00:05:09,080 making the environment suitable for learning, helping 112 00:05:09,080 --> 00:05:11,150 the infant learn over time. 113 00:05:11,150 --> 00:05:14,750 And the motivations play a critical role in that, 114 00:05:14,750 --> 00:05:19,130 as far as the infant telling the mother, am I being overwhelmed, 115 00:05:19,130 --> 00:05:20,810 am I being bored, how should you be 116 00:05:20,810 --> 00:05:23,960 interacting with me to optimize my learning ability? 117 00:05:23,960 --> 00:05:25,710 And me being the caretaker, being 118 00:05:25,710 --> 00:05:27,710 very receptive to reading these emotional issues 119 00:05:27,710 --> 00:05:29,690 and responding to them in the way 120 00:05:29,690 --> 00:05:32,060 I interact with the robot to promote its learning. 121 00:05:32,060 --> 00:05:33,470 This is Kismet. 122 00:05:33,470 --> 00:05:35,600 Kismet is my infant robot. 123 00:05:35,600 --> 00:05:38,208 It gives me facial expressions, which tells me 124 00:05:38,208 --> 00:05:39,500 what its motivational state is. 125 00:05:39,500 --> 00:05:54,205 This first one is anger, extreme anger, disgust, excitement, 126 00:05:54,205 --> 00:06:06,730 its fear, this is happiness, this one is interest, 127 00:06:06,730 --> 00:06:21,060 this one is sadness, surprise, this one is tired, 128 00:06:21,060 --> 00:06:22,410 and this one is sleep. 129 00:06:22,410 --> 00:06:25,840 130 00:06:25,840 --> 00:06:27,430 In a suitable learning environment, 131 00:06:27,430 --> 00:06:29,920 Kismet's drives are in homeostatic balance. 132 00:06:29,920 --> 00:06:32,560 This means that the robot is neither under-stimulated nor 133 00:06:32,560 --> 00:06:35,890 overwhelmed by its interaction with the caretaker. 134 00:06:35,890 --> 00:06:37,540 Stimulation intensity is computed 135 00:06:37,540 --> 00:06:41,590 by the perceptual system, moving faces, or a social stimuli, 136 00:06:41,590 --> 00:06:44,610 whose intensity is proportional to the amount of motion. 137 00:06:44,610 --> 00:06:48,000 Any other motions treated as a nonsocial stimulus. 138 00:06:48,000 --> 00:06:51,490 Kismet works with the caretaker to keep the perceptual stimuli 139 00:06:51,490 --> 00:06:53,620 within an acceptable range. 140 00:06:53,620 --> 00:06:55,150 Kismet's emotions and expressions 141 00:06:55,150 --> 00:06:57,430 reflect its motivational state. 142 00:06:57,430 --> 00:06:59,590 By reading Kismet's facial expressions, 143 00:06:59,590 --> 00:07:01,780 the caretaker can respond to the robot's needs 144 00:07:01,780 --> 00:07:04,360 and stimulate the robot appropriately. 145 00:07:04,360 --> 00:07:06,820 One of Kismet's drives is to be social. 146 00:07:06,820 --> 00:07:09,580 If Kismet does not receive any social stimulation, 147 00:07:09,580 --> 00:07:12,340 it becomes lonely and looks sad. 148 00:07:12,340 --> 00:07:14,110 The caretaker responds by making face 149 00:07:14,110 --> 00:07:16,240 to face contact with the robot. 150 00:07:16,240 --> 00:07:20,500 This satiates the social drive and Kismet displays happiness. 151 00:07:20,500 --> 00:07:23,490 However, if the social stimulus is too intense, 152 00:07:23,490 --> 00:07:26,830 Kismet becomes asocial and shows disgust. 153 00:07:26,830 --> 00:07:29,590 This is a cue for the caretaker to back off and restore 154 00:07:29,590 --> 00:07:34,070 the interaction to a suitable intensity level. 155 00:07:34,070 --> 00:07:37,460 Kismet's face detector picks up the face of a toy cow. 156 00:07:37,460 --> 00:07:40,040 As a result, Kismet uses the cow stimulus 157 00:07:40,040 --> 00:07:42,760 to satiate its social drive. 158 00:07:42,760 --> 00:07:45,260 Kismet's loneliness is satiated by the appearance of the toy 159 00:07:45,260 --> 00:07:46,790 cow. 160 00:07:46,790 --> 00:07:49,220 However, if the caretaker overstimulates 161 00:07:49,220 --> 00:07:53,360 Kismet with the toy cow, Kismet shows displeasure. 162 00:07:53,360 --> 00:07:55,820 Once the caretaker backs off, the social drive 163 00:07:55,820 --> 00:07:58,830 is restored and Kismet returns to an interested and happy 164 00:07:58,830 --> 00:07:59,330 state. 165 00:07:59,330 --> 00:08:02,590 166 00:08:02,590 --> 00:08:04,450 Another drive is to be stimulated 167 00:08:04,450 --> 00:08:06,820 with toys and other objects. 168 00:08:06,820 --> 00:08:11,800 When left unstimulated, Kismet grows bored and appears sad. 169 00:08:11,800 --> 00:08:15,220 When the caretaker uses a slinky to play with the robot, 170 00:08:15,220 --> 00:08:17,380 its stimulation drive is satiated 171 00:08:17,380 --> 00:08:20,060 and the robot appears interested. 172 00:08:20,060 --> 00:08:23,200 However, if the caretaker begins to overwhelm the robot, 173 00:08:23,200 --> 00:08:26,570 a look of fear appears on its face. 174 00:08:26,570 --> 00:08:30,590 Large slinky motions are confusing and intimidating. 175 00:08:30,590 --> 00:08:33,010 Kismet's expression of fear tells the caretaker 176 00:08:33,010 --> 00:08:35,370 that she is frightening the robot and should back off. 177 00:08:35,370 --> 00:08:41,255 178 00:08:41,255 --> 00:08:42,630 Kismet's mental stimulation drive 179 00:08:42,630 --> 00:08:45,450 is also satiated by a toy block. 180 00:08:45,450 --> 00:08:49,500 Similarly, Kismet can be overstimulated by the block. 181 00:08:49,500 --> 00:08:54,060 Again, Kismet displays an expression of fear. 182 00:08:54,060 --> 00:08:56,520 Extreme overstimulation causes the robot 183 00:08:56,520 --> 00:08:59,760 to block out external stimuli by closing its eyes 184 00:08:59,760 --> 00:09:01,910 and going to sleep. 185 00:09:01,910 --> 00:09:07,014