1 00:00:00,000 --> 00:00:52,050 2 00:00:52,050 --> 00:00:54,090 Good morning, ladies and gentlemen. 3 00:00:54,090 --> 00:00:57,690 I'd like, today, to speak about zero-knowledge. 4 00:00:57,690 --> 00:01:01,980 Or, I'd like to speak about nothing but the truth. 5 00:01:01,980 --> 00:01:04,860 6 00:01:04,860 --> 00:01:08,660 So let me ask the question that you may already 7 00:01:08,660 --> 00:01:14,830 wondering about, namely, what is zero-knowledge? 8 00:01:14,830 --> 00:01:20,330 Well, zero-knowledge is a new and powerful theory 9 00:01:20,330 --> 00:01:25,040 of both correctness and privacy of information 10 00:01:25,040 --> 00:01:28,970 with big impact both from a theoretical point of view, 11 00:01:28,970 --> 00:01:32,120 and a practical point of view. 12 00:01:32,120 --> 00:01:35,240 And this theory has been developed quite quickly 13 00:01:35,240 --> 00:01:37,610 in recent years. 14 00:01:37,610 --> 00:01:40,685 And now has already a very rich history. 15 00:01:40,685 --> 00:01:45,880 16 00:01:45,880 --> 00:01:53,830 These are, according to me, really the papers that made it. 17 00:01:53,830 --> 00:01:59,200 And if you knew already that MIT was a center of knowledge, 18 00:01:59,200 --> 00:02:01,510 you'd be happy to know that MIT is also 19 00:02:01,510 --> 00:02:03,243 center of zero knowledge. 20 00:02:03,243 --> 00:02:08,449 21 00:02:08,449 --> 00:02:13,280 Not only the starred people here all belong to our lab, 22 00:02:13,280 --> 00:02:16,240 but also most of the other ones were associated to our lab 23 00:02:16,240 --> 00:02:17,240 at one point or another. 24 00:02:17,240 --> 00:02:19,970 25 00:02:19,970 --> 00:02:30,260 Well, we were saying correctness and privacy of information. 26 00:02:30,260 --> 00:02:35,420 Now traditionally, checking the correctness of some information 27 00:02:35,420 --> 00:02:38,030 is the domain of proofs. 28 00:02:38,030 --> 00:02:41,270 So the first thing that we are going to ask today 29 00:02:41,270 --> 00:02:44,390 is, what is a proof? 30 00:02:44,390 --> 00:02:48,380 And once we have answered this big question 31 00:02:48,380 --> 00:02:50,970 with a sufficient level of generality, 32 00:02:50,970 --> 00:02:54,560 we will proceed to tackle more ambitious questions. 33 00:02:54,560 --> 00:03:00,260 Such as, how much knowledge should the proof contain? 34 00:03:00,260 --> 00:03:02,640 Let me start with the first question. 35 00:03:02,640 --> 00:03:07,140 36 00:03:07,140 --> 00:03:10,320 What is a proof? 37 00:03:10,320 --> 00:03:14,220 No matter how you put it, a pool is an interaction 38 00:03:14,220 --> 00:03:16,140 between two parties. 39 00:03:16,140 --> 00:03:22,950 One party better works hard, and we'll call him the prover. 40 00:03:22,950 --> 00:03:26,340 And the other party better be lazy. 41 00:03:26,340 --> 00:03:27,675 And we call him the verifier. 42 00:03:27,675 --> 00:03:30,990 43 00:03:30,990 --> 00:03:35,640 Should the time necessary to find the proof of theorem 44 00:03:35,640 --> 00:03:39,420 roughly coincide with the time necessary to check 45 00:03:39,420 --> 00:03:40,950 whether the proof is correct? 46 00:03:40,950 --> 00:03:43,440 Then all mathematicians would be out of jobs. 47 00:03:43,440 --> 00:03:46,300 48 00:03:46,300 --> 00:03:47,820 So that is what a proof is. 49 00:03:47,820 --> 00:03:53,640 And let me give you a simple example, which technically, 50 00:03:53,640 --> 00:03:56,220 within a framework called NP on which 51 00:03:56,220 --> 00:03:59,380 Mike Sipser spoke about today. 52 00:03:59,380 --> 00:04:01,960 So here we have our input statement the theorem 53 00:04:01,960 --> 00:04:04,140 we would like to prove. 54 00:04:04,140 --> 00:04:09,060 We have two graphs, A and B. And they are the same graphs, 55 00:04:09,060 --> 00:04:12,780 except they are drawn differently, 56 00:04:12,780 --> 00:04:15,750 and they changed the names of their nodes. 57 00:04:15,750 --> 00:04:16,769 One has numbers. 58 00:04:16,769 --> 00:04:17,640 One has letters. 59 00:04:17,640 --> 00:04:20,329 60 00:04:20,329 --> 00:04:22,730 Mathematically, when this is the case, 61 00:04:22,730 --> 00:04:25,910 you just say that A and B are isomorphic graphs. 62 00:04:25,910 --> 00:04:30,080 The same graph but drawn differently. 63 00:04:30,080 --> 00:04:34,835 Now, in this case, it is immediate 64 00:04:34,835 --> 00:04:36,760 that these two graphs are the same. 65 00:04:36,760 --> 00:04:40,870 But should this graph have had thousands of nodes, 66 00:04:40,870 --> 00:04:43,240 we wouldn't be able to distinguish 67 00:04:43,240 --> 00:04:45,200 or to detect this fact at all. 68 00:04:45,200 --> 00:04:49,660 Therefore, this is a plausible hard theorem. 69 00:04:49,660 --> 00:04:52,360 But the prover can't convince us that the two graphs are 70 00:04:52,360 --> 00:04:55,450 the same in this simple case, like in a very complicated 71 00:04:55,450 --> 00:05:00,190 case, by just sending over the correspondence, a 1 to 1 72 00:05:00,190 --> 00:05:02,740 correspondence between the vertices of A 73 00:05:02,740 --> 00:05:06,640 and the vertices of B that preserve the edges. 74 00:05:06,640 --> 00:05:12,880 That is to say, A over here, really is vertex 5 over there. 75 00:05:12,880 --> 00:05:15,790 B is free, and so on. 76 00:05:15,790 --> 00:05:18,880 And the verifier has a very easy job. 77 00:05:18,880 --> 00:05:24,970 Only to detect that each vertex, each edge in the first graph, 78 00:05:24,970 --> 00:05:26,680 is also an edge in the second graph. 79 00:05:26,680 --> 00:05:32,890 For instance, 1, 2 is an edge in A, But 1 is d, and 2 is c. 80 00:05:32,890 --> 00:05:36,250 So vertice d be an edge of it in the second graph. 81 00:05:36,250 --> 00:05:38,350 And indeed, there it is. 82 00:05:38,350 --> 00:05:41,800 And also should check with vise versa, but every edge over here 83 00:05:41,800 --> 00:05:44,400 is an edge over there. 84 00:05:44,400 --> 00:05:47,740 And you can see this may be important, because if these 85 00:05:47,740 --> 00:05:50,740 are the graphs of two large molecules, 86 00:05:50,740 --> 00:05:52,900 and one has been discovered in Europe and one 87 00:05:52,900 --> 00:05:54,430 in the United States, and they want 88 00:05:54,430 --> 00:05:57,095 to know if the two molecules are the same. 89 00:05:57,095 --> 00:05:59,470 Each researcher constructs the molecule the way he wants, 90 00:05:59,470 --> 00:06:03,910 and cannot by eye inspection find that out. 91 00:06:03,910 --> 00:06:07,760 But once that you look at this, which is a specific example, 92 00:06:07,760 --> 00:06:12,130 but is a canonical way in which proofs are given. 93 00:06:12,130 --> 00:06:14,410 There is a statement and somebody writes the proof, 94 00:06:14,410 --> 00:06:17,620 publishes in a book, namely, in our case, The Correspondence, 95 00:06:17,620 --> 00:06:20,470 and then the verifier reads the paper. 96 00:06:20,470 --> 00:06:24,190 It will be immediate that this way of proving things 97 00:06:24,190 --> 00:06:28,930 betrays much more knowledge than we are asking for. 98 00:06:28,930 --> 00:06:33,280 Here we are asking about the single bit question, 99 00:06:33,280 --> 00:06:36,160 are A and B the same graph? 100 00:06:36,160 --> 00:06:37,930 Yes or no? 101 00:06:37,930 --> 00:06:41,620 But the only way for a prover to publish it on a piece of paper 102 00:06:41,620 --> 00:06:45,580 with proof, is to give away the entire correspondence. 103 00:06:45,580 --> 00:06:47,990 Not only the single bit we are asking for, 104 00:06:47,990 --> 00:06:50,050 but the entire thing. 105 00:06:50,050 --> 00:06:52,960 And that is intuitively the case with this. 106 00:06:52,960 --> 00:06:57,340 Proofs betray much more knowledge than necessary, 107 00:06:57,340 --> 00:06:59,770 perhaps. 108 00:06:59,770 --> 00:07:03,880 But this raises a very good question. 109 00:07:03,880 --> 00:07:07,450 Namely, what is knowledge? 110 00:07:07,450 --> 00:07:10,990 And that raises even a better question. 111 00:07:10,990 --> 00:07:12,325 Is there life-- 112 00:07:12,325 --> 00:07:13,810 [LAUGHING] 113 00:07:13,810 --> 00:07:15,300 114 00:07:15,300 --> 00:07:20,130 So today, we want to answer this question, 115 00:07:20,130 --> 00:07:23,370 but from a computer science point of view. 116 00:07:23,370 --> 00:07:25,254 [LAUGHING] 117 00:07:25,254 --> 00:07:26,200 118 00:07:26,200 --> 00:07:29,820 So let me to refine our intuition, [INAUDIBLE] all 119 00:07:29,820 --> 00:07:32,910 can be made [INAUDIBLE] given more time. 120 00:07:32,910 --> 00:07:36,840 Let me tell you what knowledge is by just giving what I'd 121 00:07:36,840 --> 00:07:39,150 like to call the Knowledge Act. 122 00:07:39,150 --> 00:07:45,550 The Knowledge Act is composed if you want, by 2 axioms. 123 00:07:45,550 --> 00:07:50,760 First axiom says, efficient computation is for free. 124 00:07:50,760 --> 00:07:52,180 What does that mean? 125 00:07:52,180 --> 00:07:54,180 Let me tell you in a day-- 126 00:07:54,180 --> 00:07:55,560 everyday example. 127 00:07:55,560 --> 00:07:58,410 You are walking on the street with an integer 128 00:07:58,410 --> 00:08:02,310 on your left hand and an integer on your right hand. 129 00:08:02,310 --> 00:08:04,330 And all of a sudden you meet some guy who says, 130 00:08:04,330 --> 00:08:07,110 hey, hey you with a two integers. 131 00:08:07,110 --> 00:08:11,970 $1,000 and I will sell you the product of these two integers. 132 00:08:11,970 --> 00:08:14,430 Then I don't know what you are going to answer him. 133 00:08:14,430 --> 00:08:17,520 But what I will say is you must be crazy, 134 00:08:17,520 --> 00:08:19,200 because multiplication is easy. 135 00:08:19,200 --> 00:08:22,650 I know how to multiply and adjust and I'll keep the $1,000 136 00:08:22,650 --> 00:08:25,800 and get the product by myself. 137 00:08:25,800 --> 00:08:29,520 So we don't pay for efficient computation. 138 00:08:29,520 --> 00:08:33,960 The second axiom, random strings for free too. 139 00:08:33,960 --> 00:08:35,159 What does that mean? 140 00:08:35,159 --> 00:08:37,860 You usually, before you are walking on the street 141 00:08:37,860 --> 00:08:40,049 and some other crazy-- the same crazy guy stops you 142 00:08:40,049 --> 00:08:43,049 again and says, hey $1,000 and I'm 143 00:08:43,049 --> 00:08:46,960 going to give you 100 random bit string. 144 00:08:46,960 --> 00:08:48,570 He says, look. 145 00:08:48,570 --> 00:08:49,470 You must be crazy. 146 00:08:49,470 --> 00:08:51,610 I have a coin the same way that you do. 147 00:08:51,610 --> 00:08:53,490 And I can flip coins as well as you do. 148 00:08:53,490 --> 00:08:58,980 So I flip 100 times my coin and I save $1,000. 149 00:08:58,980 --> 00:09:04,170 So in such a case, what does convey knowledge instead? 150 00:09:04,170 --> 00:09:06,780 Well, let's assume now that you are again 151 00:09:06,780 --> 00:09:09,960 walking on the street with a graph on your left hand 152 00:09:09,960 --> 00:09:12,180 and a graph on your right hand. 153 00:09:12,180 --> 00:09:14,700 And the same guys asks you, George, you 154 00:09:14,700 --> 00:09:16,680 are having to isomorphic graphs. 155 00:09:16,680 --> 00:09:21,300 $1,000 and I sell you the correspondence. 156 00:09:21,300 --> 00:09:23,070 Then you bargain for price. 157 00:09:23,070 --> 00:09:31,110 158 00:09:31,110 --> 00:09:36,930 Because essentially, that information has knowledge. 159 00:09:36,930 --> 00:09:38,160 And what is knowledge? 160 00:09:38,160 --> 00:09:40,200 What has a chance to convey knowledge 161 00:09:40,200 --> 00:09:43,710 is the output of an feasible computation. 162 00:09:43,710 --> 00:09:48,120 A computation that would require me too long, because the chance 163 00:09:48,120 --> 00:09:49,560 of conveying knowledge. 164 00:09:49,560 --> 00:09:51,810 But once you define knowledge of a sway, 165 00:09:51,810 --> 00:09:54,120 you immediately see the traditional proofs 166 00:09:54,120 --> 00:09:55,530 is an overkill. 167 00:09:55,530 --> 00:09:57,270 It's not only the problem of the graph 168 00:09:57,270 --> 00:10:00,630 I saw my face on problem like we saw before, is always the case. 169 00:10:00,630 --> 00:10:02,680 For the proofs that you publish regularly, 170 00:10:02,680 --> 00:10:07,470 the proof-- the good old proofs, they convey too much knowledge 171 00:10:07,470 --> 00:10:08,280 when they need. 172 00:10:08,280 --> 00:10:12,190 It's like shooting at the fly with a gun. 173 00:10:12,190 --> 00:10:15,990 Now, given this is a problem, a traditional proofs, 174 00:10:15,990 --> 00:10:17,730 well, let's see. 175 00:10:17,730 --> 00:10:21,840 These proofs are a mental construct 176 00:10:21,840 --> 00:10:26,040 roughly 2,500 years ago, which started in ancient Greece. 177 00:10:26,040 --> 00:10:29,070 178 00:10:29,070 --> 00:10:34,560 And we have been living very happily in the house 179 00:10:34,560 --> 00:10:38,080 with Euclid and others built for us, 180 00:10:38,080 --> 00:10:41,580 except that in recent years, this house is becoming 181 00:10:41,580 --> 00:10:43,560 a little bit small for us. 182 00:10:43,560 --> 00:10:46,710 So when you tackle questions about knowledge, 183 00:10:46,710 --> 00:10:49,350 we don't quite fit in there anymore. 184 00:10:49,350 --> 00:10:54,000 And we want to build a new house, 185 00:10:54,000 --> 00:10:57,040 retaining a wave of the flavor of the old one. 186 00:10:57,040 --> 00:11:01,020 So we don't want any more to deal with traditional proofs, 187 00:11:01,020 --> 00:11:04,230 but a different type of object, which 188 00:11:04,230 --> 00:11:05,550 I'm going to present right now. 189 00:11:05,550 --> 00:11:07,170 And hopefully you'll be convinced 190 00:11:07,170 --> 00:11:12,690 of it is enough for the same thing but we can be happy. 191 00:11:12,690 --> 00:11:17,100 So I'd like therefore, to give you 192 00:11:17,100 --> 00:11:22,950 a flavor of these new proofs via an abridged example. 193 00:11:22,950 --> 00:11:26,820 And you should notice that these new proofs differ 194 00:11:26,820 --> 00:11:29,160 from the classical notion of a proof in two 195 00:11:29,160 --> 00:11:30,820 important respects. 196 00:11:30,820 --> 00:11:33,180 First, they are probabilistic. 197 00:11:33,180 --> 00:11:38,140 Prover and verifier flip coins doing the proofing process. 198 00:11:38,140 --> 00:11:40,530 Second, they are instructed. 199 00:11:40,530 --> 00:11:43,320 They cannot be published in a book, 200 00:11:43,320 --> 00:11:47,670 but the prover and the verifier will talk back and forth, 201 00:11:47,670 --> 00:11:51,615 very much like in classroom teaching. 202 00:11:51,615 --> 00:11:52,990 Let's have a look at the example. 203 00:11:52,990 --> 00:11:55,520 204 00:11:55,520 --> 00:12:00,320 Here is the proof of graph isomorphism. 205 00:12:00,320 --> 00:12:03,050 Input, we have two graphs, A and B. 206 00:12:03,050 --> 00:12:05,990 And one way as we said to prove A and B on the same graph 207 00:12:05,990 --> 00:12:10,430 is ship over to the verifier, the correspondence. 208 00:12:10,430 --> 00:12:12,320 The prover and the verifier instead 209 00:12:12,320 --> 00:12:17,780 will be engaging in a somewhat more complicated game. 210 00:12:17,780 --> 00:12:23,060 They are going to repeat K times the following instructions. 211 00:12:23,060 --> 00:12:30,210 First, the prover flips coins to generate C, a fared graph, 212 00:12:30,210 --> 00:12:34,940 which is a scramble version of A and B. So C is a graph 213 00:12:34,940 --> 00:12:37,850 and is isomorphic to A and isomorphic to B, 214 00:12:37,850 --> 00:12:40,650 because A and B are isomorphic between what each other. 215 00:12:40,650 --> 00:12:43,250 The only point is really to prove it. 216 00:12:43,250 --> 00:12:46,160 So takes this random graph isomorphic to both, 217 00:12:46,160 --> 00:12:47,120 and sends it over. 218 00:12:47,120 --> 00:12:49,830 219 00:12:49,830 --> 00:12:54,300 Claiming C is isomorphic to both A and B. 220 00:12:54,300 --> 00:12:58,110 Now, that the verifier doesn't want to listen to anything. 221 00:12:58,110 --> 00:13:02,550 He flips a coin to decide at random between A and B. 222 00:13:02,550 --> 00:13:04,860 And let us assume he has the going 223 00:13:04,860 --> 00:13:11,910 has come up A. Therefore he will send A request to the prover. 224 00:13:11,910 --> 00:13:14,850 And the prover will satisfy this request 225 00:13:14,850 --> 00:13:20,670 by sending pi, a correspondence between the nodes of A 226 00:13:20,670 --> 00:13:23,400 and the nodes of C with Cervantes, 227 00:13:23,400 --> 00:13:29,130 thereby proving with A and C aren't missing graph. 228 00:13:29,130 --> 00:13:31,440 And verify it given the correspondence, he says, 229 00:13:31,440 --> 00:13:32,100 that's true. 230 00:13:32,100 --> 00:13:33,570 A and C are the same graph. 231 00:13:33,570 --> 00:13:35,430 He says OK. 232 00:13:35,430 --> 00:13:40,170 So much fun, we'll do it over and over again, K times. 233 00:13:40,170 --> 00:13:43,980 And even verifier said OK K times in a row, 234 00:13:43,980 --> 00:13:47,010 then they will say fine, I'm convinced, 235 00:13:47,010 --> 00:13:49,110 A and B are the same graph. 236 00:13:49,110 --> 00:13:52,210 237 00:13:52,210 --> 00:13:53,650 Why is this a proof? 238 00:13:53,650 --> 00:13:56,500 That's an interaction, but way of proof? 239 00:13:56,500 --> 00:14:00,310 Well, first of all, let me tell you that if the film is true, 240 00:14:00,310 --> 00:14:03,250 the verify and will be convinced. 241 00:14:03,250 --> 00:14:05,320 It goes the prover does what I said, 242 00:14:05,320 --> 00:14:07,120 but verifier does what I said and they 243 00:14:07,120 --> 00:14:09,670 will be convinced that at the time, all of it at the end 244 00:14:09,670 --> 00:14:12,500 all the time. 245 00:14:12,500 --> 00:14:14,140 Let's now argue over whether this 246 00:14:14,140 --> 00:14:16,510 is a proof by contradiction as we 247 00:14:16,510 --> 00:14:18,910 prove almost anything anyway. 248 00:14:18,910 --> 00:14:22,240 So let us assume that the film is false. 249 00:14:22,240 --> 00:14:24,510 Can we convince somebody over a false film? 250 00:14:24,510 --> 00:14:28,730 251 00:14:28,730 --> 00:14:32,240 So let's assume now with A and B are different graphs 252 00:14:32,240 --> 00:14:35,270 but we assume you can figure it out. 253 00:14:35,270 --> 00:14:38,720 Then they claim that no matter how the prover choose 254 00:14:38,720 --> 00:14:44,510 a C, if a C is not the same as A or C is not the same as C 255 00:14:44,510 --> 00:14:47,270 or is neither. 256 00:14:47,270 --> 00:14:50,810 So when C is shipped over no matter 257 00:14:50,810 --> 00:14:54,980 how it's chosen, when re-verify a flips a coin between A and B, 258 00:14:54,980 --> 00:14:58,490 he has at least a chance one half 259 00:14:58,490 --> 00:15:01,490 of catching the prover in the wrong. 260 00:15:01,490 --> 00:15:06,380 Because even prover construct C by scrambling A say, 261 00:15:06,380 --> 00:15:09,560 and even verifier asks for A, he can ship over 262 00:15:09,560 --> 00:15:11,870 the correspondence between A and C. 263 00:15:11,870 --> 00:15:15,860 But if a prover constructed C by scrambling B, 264 00:15:15,860 --> 00:15:20,910 and the verifier asks for A, he has nothing to send here. 265 00:15:20,910 --> 00:15:22,790 So then the verifier would say no, 266 00:15:22,790 --> 00:15:27,050 this is not the correspondence because there's an A. 267 00:15:27,050 --> 00:15:31,930 So we chance a half, whatever graph you send here the prover 268 00:15:31,930 --> 00:15:35,440 cannot satisfy the verifier's request. 269 00:15:35,440 --> 00:15:40,150 And if they do this 2 times, the prover has only a chance 270 00:15:40,150 --> 00:15:43,310 1 and 1/4 of satisfying to verifier. 271 00:15:43,310 --> 00:15:46,360 And if you do it the K times, its chances are 1 and 2 272 00:15:46,360 --> 00:15:49,190 to the K. And that's a very small number. 273 00:15:49,190 --> 00:15:52,600 If K is 300 2 to the 300 is larger than the number 274 00:15:52,600 --> 00:15:56,930 of electrons in our universe. 275 00:15:56,930 --> 00:15:57,430 OK. 276 00:15:57,430 --> 00:16:00,320 277 00:16:00,320 --> 00:16:04,670 And now they say, why do we go to the trouble of such 278 00:16:04,670 --> 00:16:06,380 a complicated truth? 279 00:16:06,380 --> 00:16:09,050 But not too complicated, but nonetheless is not 280 00:16:09,050 --> 00:16:11,240 a stretch for understanding the correspondence. 281 00:16:11,240 --> 00:16:14,420 Because the proof I just talk about is a zero knowledge 282 00:16:14,420 --> 00:16:15,770 proof. 283 00:16:15,770 --> 00:16:18,780 And why is it zero knowledge? 284 00:16:18,780 --> 00:16:22,250 Well, because at the end of the proof, 285 00:16:22,250 --> 00:16:26,280 the verifier knows that A and B are the same graph. 286 00:16:26,280 --> 00:16:30,110 That's what the proof want to give it away. 287 00:16:30,110 --> 00:16:31,970 But what else does he know at the end? 288 00:16:31,970 --> 00:16:35,090 He also knows which words he and the prover 289 00:16:35,090 --> 00:16:36,500 exchange with one another. 290 00:16:36,500 --> 00:16:37,820 He was there. 291 00:16:37,820 --> 00:16:42,270 Listen, now I claim but from these words, 292 00:16:42,270 --> 00:16:45,320 you can retrieve that A and B are the same graph but you 293 00:16:45,320 --> 00:16:48,500 cannot retrieve anything else. 294 00:16:48,500 --> 00:16:50,990 And how am I going to say that? 295 00:16:50,990 --> 00:16:53,480 Actually, how am I going to prove that? 296 00:16:53,480 --> 00:16:57,280 297 00:16:57,280 --> 00:16:59,690 Well, here it is how. 298 00:16:59,690 --> 00:17:02,420 Let's assume now that the verifier 299 00:17:02,420 --> 00:17:07,609 knows priori with A and B are isomorphic graphs. 300 00:17:07,609 --> 00:17:08,510 This is single beat. 301 00:17:08,510 --> 00:17:11,510 He only knows that A and B are missing graph. 302 00:17:11,510 --> 00:17:15,079 Now, he claimed that the proof is relevant for him 303 00:17:15,079 --> 00:17:18,290 because he can come up with the proof himself. 304 00:17:18,290 --> 00:17:19,069 Here is how. 305 00:17:19,069 --> 00:17:22,740 306 00:17:22,740 --> 00:17:27,780 So the verifier now first flips a coin 307 00:17:27,780 --> 00:17:33,870 to decide between A or B. And let's assume that the coin has 308 00:17:33,870 --> 00:17:38,550 come up A. Therefore, given the coin has come up A, 309 00:17:38,550 --> 00:17:43,300 he scrambles by some correspondence 310 00:17:43,300 --> 00:17:48,640 pi, the vertices of A and creates a graph C. 311 00:17:48,640 --> 00:17:52,070 So he comes up with a renaming of the vertices of A, 312 00:17:52,070 --> 00:17:55,390 renames the vertices and comes up with a graph C. 313 00:17:55,390 --> 00:17:59,260 Then is going to lay down with the following words. 314 00:17:59,260 --> 00:18:01,570 C with an arrow from left to right. 315 00:18:01,570 --> 00:18:05,260 A with an arrow from right to left. 316 00:18:05,260 --> 00:18:09,020 And pi with an arrow from left to right. 317 00:18:09,020 --> 00:18:10,880 End of the quotes. 318 00:18:10,880 --> 00:18:14,140 Now, I claim that this conversation 319 00:18:14,140 --> 00:18:16,250 is equally likely is essentially the same 320 00:18:16,250 --> 00:18:18,230 with this conversation. 321 00:18:18,230 --> 00:18:19,040 Why? 322 00:18:19,040 --> 00:18:22,930 Well, the first entry, the first message here, 323 00:18:22,930 --> 00:18:28,990 is a randomly scrambled version of A and B. And so is here. 324 00:18:28,990 --> 00:18:32,700 The second message here is the result of a coin toss. 325 00:18:32,700 --> 00:18:34,240 And so is here. 326 00:18:34,240 --> 00:18:37,470 Here, we noticed that the coin we flipped after seeing C, 327 00:18:37,470 --> 00:18:39,240 and here we flipped up before. 328 00:18:39,240 --> 00:18:41,880 But none of it is a random coin toss. 329 00:18:41,880 --> 00:18:43,290 And finally, what is here? 330 00:18:43,290 --> 00:18:47,032 Here is a correspondence between vertices of A and vertices of C 331 00:18:47,032 --> 00:18:48,360 that preserve the edges. 332 00:18:48,360 --> 00:18:50,680 And so is here. 333 00:18:50,680 --> 00:18:55,560 So you see why this mechanism proves to you that A and B are 334 00:18:55,560 --> 00:18:58,350 isomorphic graphs, if you knew this beforehand, 335 00:18:58,350 --> 00:19:01,440 this mechanism doesn't add anything else. 336 00:19:01,440 --> 00:19:05,070 Because you could reconstruct the conversations yourself 337 00:19:05,070 --> 00:19:08,770 and whatever you can do by yourself is your knowledge. 338 00:19:08,770 --> 00:19:11,610 So this proof is a zero knowledge on top of the fact 339 00:19:11,610 --> 00:19:14,950 that A and B are the same graph. 340 00:19:14,950 --> 00:19:19,230 So I've showed you basically a way for the graph isomorphism 341 00:19:19,230 --> 00:19:25,310 problem that convinces you that the proof exists 342 00:19:25,310 --> 00:19:28,550 even though you don't even know how the proof starts. 343 00:19:28,550 --> 00:19:31,970 And this is just a small piece of this theory 344 00:19:31,970 --> 00:19:35,870 of zero knowledge which goes on to answer 345 00:19:35,870 --> 00:19:41,810 many other questions not fundamental to our mathematical 346 00:19:41,810 --> 00:19:42,620 thinking. 347 00:19:42,620 --> 00:19:46,520 Namely, what does it mean to prove something. 348 00:19:46,520 --> 00:19:50,430 What does it mean that something is false? 349 00:19:50,430 --> 00:19:53,040 What does it mean to convey knowledge? 350 00:19:53,040 --> 00:19:56,170 What does it mean to be totally unintelligible? 351 00:19:56,170 --> 00:19:59,700 352 00:19:59,700 --> 00:20:04,020 Now, a lot of leverage that you can 353 00:20:04,020 --> 00:20:08,130 do if you answer correctly such a question. 354 00:20:08,130 --> 00:20:12,390 And in fact, look, you don't go to bovver the classical notion 355 00:20:12,390 --> 00:20:16,110 of a proof which has been there for 500 years from nothing. 356 00:20:16,110 --> 00:20:18,750 Because once you are large such powerful notions 357 00:20:18,750 --> 00:20:19,950 but very powerful. 358 00:20:19,950 --> 00:20:21,850 You just walk outside even crazier 359 00:20:21,850 --> 00:20:25,290 because it was built using mathematical formulas 360 00:20:25,290 --> 00:20:28,377 and prove that it stood before people really put together. 361 00:20:28,377 --> 00:20:30,210 So once if you tackle this notion of proofs, 362 00:20:30,210 --> 00:20:31,930 you gain power. 363 00:20:31,930 --> 00:20:36,690 And in fact, this theory has applications in practice. 364 00:20:36,690 --> 00:20:40,350 And these applications are nothing less 365 00:20:40,350 --> 00:20:44,160 of privacy of communication. 366 00:20:44,160 --> 00:20:50,400 Ability of checking programs, checking program or designing 367 00:20:50,400 --> 00:20:53,460 programs so that they are much more reliable than we 368 00:20:53,460 --> 00:20:54,700 could do before. 369 00:20:54,700 --> 00:20:55,930 And I don't have the time. 370 00:20:55,930 --> 00:20:59,490 So if it's a question that lies at the heart of our computer 371 00:20:59,490 --> 00:21:03,420 age, but I don't have a question to illustrate this. 372 00:21:03,420 --> 00:21:07,290 But in the last two minutes left to me, 373 00:21:07,290 --> 00:21:12,600 I'm going perhaps to take just a small piece and one 374 00:21:12,600 --> 00:21:13,390 small piece. 375 00:21:13,390 --> 00:21:22,830 I chose is the problem of passwords. 376 00:21:22,830 --> 00:21:26,150 Passwords is an ancient problem. 377 00:21:26,150 --> 00:21:29,070 Or rather a problem is an ancient tool. 378 00:21:29,070 --> 00:21:32,580 And I was rereading to see it is a while ago 379 00:21:32,580 --> 00:21:36,330 and with the continuous source of inspiration for me. 380 00:21:36,330 --> 00:21:39,810 381 00:21:39,810 --> 00:21:45,600 And he was describing a famous battle between the Athenians 382 00:21:45,600 --> 00:21:49,817 and the Syracusans, roughly around 400 BC. 383 00:21:49,817 --> 00:21:51,400 And what is strange about this battle, 384 00:21:51,400 --> 00:21:52,770 the battle was a nighttime. 385 00:21:52,770 --> 00:21:55,350 So they couldn't-- the two armies couldn't see each other 386 00:21:55,350 --> 00:21:57,030 and they couldn't recognize each other. 387 00:21:57,030 --> 00:22:00,090 Therefore, the Athenians had the password. 388 00:22:00,090 --> 00:22:02,860 But they were shouting with passwords back and forth 389 00:22:02,860 --> 00:22:07,890 so much that the Syracusans got hold of the password 390 00:22:07,890 --> 00:22:10,110 and they started shouting also, and they 391 00:22:10,110 --> 00:22:13,110 passed through their lines and then attacked them on the back 392 00:22:13,110 --> 00:22:15,430 and defeated them. 393 00:22:15,430 --> 00:22:21,210 Now, this problem does not arise just 394 00:22:21,210 --> 00:22:23,410 because we are so back in time. 395 00:22:23,410 --> 00:22:25,950 This is a problem and today's also. 396 00:22:25,950 --> 00:22:27,900 If you use passwords for instance 397 00:22:27,900 --> 00:22:30,870 to login in the morning from your home terminal 398 00:22:30,870 --> 00:22:34,380 to the mainframe, how does this work? 399 00:22:34,380 --> 00:22:35,760 It's the same thing. 400 00:22:35,760 --> 00:22:39,210 Well, the mainframe has a list of users and their passwords 401 00:22:39,210 --> 00:22:42,420 in my case, for president. 402 00:22:42,420 --> 00:22:45,120 Now, which my friends do guests but they 403 00:22:45,120 --> 00:22:47,370 don't guess the number of exclamative points. 404 00:22:47,370 --> 00:22:51,750 405 00:22:51,750 --> 00:22:57,240 So if the right password is introduced, 406 00:22:57,240 --> 00:22:59,360 then the mainframe says, yes this 407 00:22:59,360 --> 00:23:04,070 must be Silvio lets this guy have access to my files. 408 00:23:04,070 --> 00:23:05,930 But what is the problem with this scheme? 409 00:23:05,930 --> 00:23:09,110 Well, the problem is that listening on that 410 00:23:09,110 --> 00:23:10,820 may be a bad guy. 411 00:23:10,820 --> 00:23:13,130 Therefore he captures my passwords very much 412 00:23:13,130 --> 00:23:14,400 like this occlusions. 413 00:23:14,400 --> 00:23:18,650 And he's going next today to impersonate me. 414 00:23:18,650 --> 00:23:22,490 And that is intrinsic in the nature of a password. 415 00:23:22,490 --> 00:23:28,010 If somebody figures out your password, you're dead. 416 00:23:28,010 --> 00:23:34,100 From 400 we see up to now but not anymore. 417 00:23:34,100 --> 00:23:35,750 Let me tell you why. 418 00:23:35,750 --> 00:23:38,540 And here is our zero knowledge comes to the rescue. 419 00:23:38,540 --> 00:23:41,830 420 00:23:41,830 --> 00:23:44,230 You see in the mainframe, just to keep 421 00:23:44,230 --> 00:23:45,910 it simple for the sake of example 422 00:23:45,910 --> 00:23:49,720 I'm going to store close to my name a pair of graphs, 423 00:23:49,720 --> 00:23:52,270 say files on the vertices. 424 00:23:52,270 --> 00:23:55,670 And they're copies, different copies, 425 00:23:55,670 --> 00:23:58,040 scrambled copies of the same graph. 426 00:23:58,040 --> 00:23:59,710 So A and B are isomorphic. 427 00:23:59,710 --> 00:24:03,010 But I do not store in the mainframe the isomorphism so 428 00:24:03,010 --> 00:24:04,690 as the parentheses indicate. 429 00:24:04,690 --> 00:24:06,310 I do not store the correspondence. 430 00:24:06,310 --> 00:24:08,740 I will keep the correspondence. 431 00:24:08,740 --> 00:24:14,080 And now I tell the mainframe who proves 432 00:24:14,080 --> 00:24:17,530 to you that A is the same graph and B, that's me. 433 00:24:17,530 --> 00:24:19,630 Let him have access to my files. 434 00:24:19,630 --> 00:24:22,810 435 00:24:22,810 --> 00:24:28,010 And now, in the morning, which will take a while but not. 436 00:24:28,010 --> 00:24:31,910 437 00:24:31,910 --> 00:24:35,090 I'm going to wake up, scramble-- 438 00:24:35,090 --> 00:24:38,300 give a random scramble version C of A and B 439 00:24:38,300 --> 00:24:40,160 and ship it over to the mainframe. 440 00:24:40,160 --> 00:24:44,570 The mainframe flips a coin to decide between A and B, 441 00:24:44,570 --> 00:24:46,520 say he decides A. And therefore he 442 00:24:46,520 --> 00:24:49,550 request the correspondence between A and C, 443 00:24:49,550 --> 00:24:51,170 and they give him the response. 444 00:24:51,170 --> 00:24:55,140 And the mainframe says, OK, so far so good. 445 00:24:55,140 --> 00:24:57,770 This is a correspondence between A and C. 446 00:24:57,770 --> 00:25:02,690 And now, we are going to do it 20 times. 447 00:25:02,690 --> 00:25:05,240 It will take a while. 448 00:25:05,240 --> 00:25:07,590 But what do I gain by this? 449 00:25:07,590 --> 00:25:12,110 You see what I gain is, if somebody now 450 00:25:12,110 --> 00:25:15,920 is listening to these conversations 451 00:25:15,920 --> 00:25:18,240 he is rather confused. 452 00:25:18,240 --> 00:25:18,740 Why? 453 00:25:18,740 --> 00:25:21,980 Because this proof I already argued 454 00:25:21,980 --> 00:25:24,330 before is a zero knowledge proof. 455 00:25:24,330 --> 00:25:27,890 So only it betrays is the fact that A and B are the same, 456 00:25:27,890 --> 00:25:30,920 but not to what the correspondence is. 457 00:25:30,920 --> 00:25:32,540 So even though somebody is listening 458 00:25:32,540 --> 00:25:34,940 to this identification process, he 459 00:25:34,940 --> 00:25:37,520 has no way to impersonate me. 460 00:25:37,520 --> 00:25:40,980 And just to give you just a simple example, 461 00:25:40,980 --> 00:25:45,860 assume this guy records what graph C I send over. 462 00:25:45,860 --> 00:25:48,600 And then in the previous conversation 463 00:25:48,600 --> 00:25:52,970 the mainframe has asked me for A. So he sends over the same C, 464 00:25:52,970 --> 00:25:55,430 assume that the computer chooses at random between A and B 465 00:25:55,430 --> 00:25:57,200 and he's lucky. 466 00:25:57,200 --> 00:25:59,420 He chooses the same A. Therefore, 467 00:25:59,420 --> 00:26:02,120 he can send the same pi that I sent. 468 00:26:02,120 --> 00:26:05,780 But for the second graph, the computer originally 469 00:26:05,780 --> 00:26:10,640 say you ask me for A. But now, he's asking him for B, 470 00:26:10,640 --> 00:26:13,130 then he doesn't know what cosponsorship say. 471 00:26:13,130 --> 00:26:14,330 You see what it is? 472 00:26:14,330 --> 00:26:17,960 Essentially this is a trick to shrink down 473 00:26:17,960 --> 00:26:23,140 an exponential number of questions to very very few. 474 00:26:23,140 --> 00:26:25,610 Good. 475 00:26:25,610 --> 00:26:30,920 Now, this is not practical by any stretch of imagination. 476 00:26:30,920 --> 00:26:37,030 However, this does not bother us as theoreticians. 477 00:26:37,030 --> 00:26:40,120 But let me tell you why he does not bother us as theoreticians. 478 00:26:40,120 --> 00:26:43,510 Because now, once you know what you are shooting for, 479 00:26:43,510 --> 00:26:46,960 and this is the last slide of my talk, 480 00:26:46,960 --> 00:26:49,490 then you can make it efficient later. 481 00:26:49,490 --> 00:26:52,660 And in fact, not using graph isomorphism 482 00:26:52,660 --> 00:26:57,100 but using number theory, the same identification process 483 00:26:57,100 --> 00:27:00,550 much more efficient has been devised 484 00:27:00,550 --> 00:27:02,860 by Gold Wasser and myself and Rackoff 485 00:27:02,860 --> 00:27:05,950 and then improved by Fiat, Feige and Shamir. 486 00:27:05,950 --> 00:27:10,030 And finally, Shamir and I have a zero knowledge identification 487 00:27:10,030 --> 00:27:13,000 scheme by which in the morning to identify yourself, 488 00:27:13,000 --> 00:27:16,960 all you need to have is two multiplications, which 489 00:27:16,960 --> 00:27:19,175 I consider it practical. 490 00:27:19,175 --> 00:27:22,150 Now, for the future. 491 00:27:22,150 --> 00:27:28,660 Well, we have only explored the tip of the iceberg 492 00:27:28,660 --> 00:27:30,560 and there is a lot of other things 493 00:27:30,560 --> 00:27:34,190 that needs to be discovered in really farther down, 494 00:27:34,190 --> 00:27:35,970 and we want to bring it to light. 495 00:27:35,970 --> 00:27:39,340 Let me tell you a waiver of it, thanks to zero knowledge, 496 00:27:39,340 --> 00:27:42,820 cryptography is undergoing a transition 497 00:27:42,820 --> 00:27:45,430 from an art to a science. 498 00:27:45,430 --> 00:27:45,970 Thank you. 499 00:27:45,970 --> 00:27:48,070 [APPLAUSE] 500 00:27:48,070 --> 00:27:50,820 [MUSIC PLAYING] 501 00:27:50,820 --> 00:28:15,000