1 00:00:00,000 --> 00:00:18,210 2 00:00:18,210 --> 00:00:20,490 Welcome to the Distinguished Lecture Series. 3 00:00:20,490 --> 00:00:24,000 I'm Karen Matthews of University Video Communications. 4 00:00:24,000 --> 00:00:25,560 Our program is designed to enrich 5 00:00:25,560 --> 00:00:27,480 graduate and upper division curricula, 6 00:00:27,480 --> 00:00:30,240 with academic presentations by top computer scientists 7 00:00:30,240 --> 00:00:31,620 from industry. 8 00:00:31,620 --> 00:00:33,300 Its speakers and topics were nominated 9 00:00:33,300 --> 00:00:37,230 by computer science faculty of more than 50 universities. 10 00:00:37,230 --> 00:00:41,820 This presentation is sponsored by Symbolics Inc. Tom Knight, 11 00:00:41,820 --> 00:00:43,950 a major contributor to computer architectures 12 00:00:43,950 --> 00:00:46,050 for artificial intelligence applications 13 00:00:46,050 --> 00:00:48,540 received his PhD in electrical engineering 14 00:00:48,540 --> 00:00:52,200 from the Massachusetts Institute of Technology in 1982. 15 00:00:52,200 --> 00:00:55,380 In 1980, he co-founded Symbolics, and is now 16 00:00:55,380 --> 00:00:57,570 their technical director. 17 00:00:57,570 --> 00:00:59,970 Presently on leave from the MIT faculty, 18 00:00:59,970 --> 00:01:03,930 he has served as assistant professor there since 1982. 19 00:01:03,930 --> 00:01:06,270 Tom's talk today is architectural support 20 00:01:06,270 --> 00:01:08,160 for symbolic computers. 21 00:01:08,160 --> 00:01:10,920 It's our pleasure to introduce Tom Knight. 22 00:01:10,920 --> 00:01:12,240 Dr. Knight. 23 00:01:12,240 --> 00:01:13,470 Thank you. 24 00:01:13,470 --> 00:01:16,470 Today I'd like to spend some time telling you 25 00:01:16,470 --> 00:01:21,850 about special purpose computer architectures 26 00:01:21,850 --> 00:01:23,980 for artificial intelligence applications. 27 00:01:23,980 --> 00:01:26,670 28 00:01:26,670 --> 00:01:29,360 I'd like to begin the talk by spending 29 00:01:29,360 --> 00:01:31,730 a few minutes discussing the differences 30 00:01:31,730 --> 00:01:35,570 between symbolic computing and other styles of computing, 31 00:01:35,570 --> 00:01:38,810 notably the numeric styles of computing, which most 32 00:01:38,810 --> 00:01:40,790 people are more familiar with. 33 00:01:40,790 --> 00:01:45,650 And I'd like to move on to the subject of the languages 34 00:01:45,650 --> 00:01:48,410 and operating system features, which 35 00:01:48,410 --> 00:01:53,990 make it possible to run artificial intelligence 36 00:01:53,990 --> 00:01:55,940 sorts of programs more effectively 37 00:01:55,940 --> 00:01:58,020 on these architectures. 38 00:01:58,020 --> 00:02:00,350 I'd like to spend the majority of my time 39 00:02:00,350 --> 00:02:04,010 today discussing hardware support 40 00:02:04,010 --> 00:02:08,240 within these architectures for the language and operating 41 00:02:08,240 --> 00:02:10,880 system features I described above. 42 00:02:10,880 --> 00:02:12,560 And finally, I'd like to conclude 43 00:02:12,560 --> 00:02:16,520 with a discussion of some of the future directions of computer 44 00:02:16,520 --> 00:02:17,810 architecture in this field. 45 00:02:17,810 --> 00:02:21,700 46 00:02:21,700 --> 00:02:28,330 The symbolic computing differs from conventional computing 47 00:02:28,330 --> 00:02:30,470 in a number of ways. 48 00:02:30,470 --> 00:02:32,980 The main difference is a difference 49 00:02:32,980 --> 00:02:38,260 in emphasis on the style of computing 50 00:02:38,260 --> 00:02:40,510 that is done on these machines. 51 00:02:40,510 --> 00:02:44,260 In a conventional machine, by and large the architecture 52 00:02:44,260 --> 00:02:48,880 is optimized for execution of large programs. 53 00:02:48,880 --> 00:02:51,220 Very little attention is paid to the process 54 00:02:51,220 --> 00:02:54,580 of program development, and the process 55 00:02:54,580 --> 00:02:57,370 of interacting with users. 56 00:02:57,370 --> 00:03:00,550 Instead, large Fortran programs typically 57 00:03:00,550 --> 00:03:06,490 are used to execute-- are used to solve 58 00:03:06,490 --> 00:03:09,310 large physical problems, and the dominant concern 59 00:03:09,310 --> 00:03:14,110 is the concern for the speed with which those programs run. 60 00:03:14,110 --> 00:03:16,810 In the symbolic world, we instead 61 00:03:16,810 --> 00:03:21,220 have an emphasis on incremental software development 62 00:03:21,220 --> 00:03:25,900 and the development process, rather than on the execution 63 00:03:25,900 --> 00:03:27,580 speed of the resulting code. 64 00:03:27,580 --> 00:03:30,550 This is not to say that the execution speed of the code 65 00:03:30,550 --> 00:03:32,110 is irrelevant. 66 00:03:32,110 --> 00:03:35,290 Often, you want to be able to develop programs effectively 67 00:03:35,290 --> 00:03:37,550 and execute them quickly. 68 00:03:37,550 --> 00:03:40,570 But that's not the whole picture. 69 00:03:40,570 --> 00:03:44,980 Instead, we have to be able to construct those programs 70 00:03:44,980 --> 00:03:47,680 in an effective way. 71 00:03:47,680 --> 00:03:50,440 Typically, the programs that are solved 72 00:03:50,440 --> 00:03:52,840 on symbolic architectures are unstructured 73 00:03:52,840 --> 00:03:56,410 or poorly understood problems. 74 00:03:56,410 --> 00:04:02,230 We wish to be able to construct software solutions which 75 00:04:02,230 --> 00:04:07,840 consist of relatively malleable software subroutines, 76 00:04:07,840 --> 00:04:12,505 and want to be able to construct reusable and flexible software. 77 00:04:12,505 --> 00:04:15,020 78 00:04:15,020 --> 00:04:18,140 The software is often thought of in terms of middle 79 00:04:18,140 --> 00:04:20,029 out programming technology. 80 00:04:20,029 --> 00:04:22,640 Usually people don't know how they 81 00:04:22,640 --> 00:04:24,800 are going to solve these problems, how they 82 00:04:24,800 --> 00:04:29,630 are going to construct their program before the program is 83 00:04:29,630 --> 00:04:30,560 complete. 84 00:04:30,560 --> 00:04:33,830 So the top down approaches to constructing programs simply 85 00:04:33,830 --> 00:04:34,850 don't work. 86 00:04:34,850 --> 00:04:38,690 Instead, often the programs are constructed 87 00:04:38,690 --> 00:04:42,110 by building a few pieces of it, using 88 00:04:42,110 --> 00:04:45,980 dummy subroutines, for example, to allow 89 00:04:45,980 --> 00:04:49,970 us to test those features, and then going on and completing 90 00:04:49,970 --> 00:04:52,310 the program, or perhaps changing directions 91 00:04:52,310 --> 00:04:55,340 radically in the middle of writing the program. 92 00:04:55,340 --> 00:04:58,460 In this process, in this development process 93 00:04:58,460 --> 00:05:02,000 the debugging and the ability to modify and interact 94 00:05:02,000 --> 00:05:06,230 with the program really are the crucial elements. 95 00:05:06,230 --> 00:05:09,950 The programmer in this process is the most important process-- 96 00:05:09,950 --> 00:05:12,230 most important resource. 97 00:05:12,230 --> 00:05:14,030 Programmers, after all, these days 98 00:05:14,030 --> 00:05:16,490 are paid far more than the computers 99 00:05:16,490 --> 00:05:20,660 that they use to solve their problems. 100 00:05:20,660 --> 00:05:24,600 A typical workstation today is dropping in price very, 101 00:05:24,600 --> 00:05:25,640 very rapidly. 102 00:05:25,640 --> 00:05:30,890 Whereas the programmer's salary is going up almost as rapidly. 103 00:05:30,890 --> 00:05:36,020 We need to be able to leverage the software 104 00:05:36,020 --> 00:05:39,770 capabilities of that programmer to be able to solve problems 105 00:05:39,770 --> 00:05:41,480 more effectively. 106 00:05:41,480 --> 00:05:47,030 And this optimizes in a sense the process 107 00:05:47,030 --> 00:05:53,000 of program development, because after all, the programmer now 108 00:05:53,000 --> 00:05:59,990 becomes the dominant concern, the dominant cost 109 00:05:59,990 --> 00:06:03,320 element in the system. 110 00:06:03,320 --> 00:06:06,830 What sort of languages and operating system 111 00:06:06,830 --> 00:06:12,680 features allow us to make this shift from an execution driven 112 00:06:12,680 --> 00:06:17,030 environment to an environment where the development process 113 00:06:17,030 --> 00:06:19,160 is the dominant concern? 114 00:06:19,160 --> 00:06:22,790 One of the major features in the operating 115 00:06:22,790 --> 00:06:25,070 system in the languages area is the ability 116 00:06:25,070 --> 00:06:27,680 to support generic functions. 117 00:06:27,680 --> 00:06:29,820 We'll say more about these in a minute. 118 00:06:29,820 --> 00:06:33,830 Second feature is the automatic reclamation 119 00:06:33,830 --> 00:06:36,830 of storage, the so-called garbage collection process. 120 00:06:36,830 --> 00:06:38,810 And the third, about which I will 121 00:06:38,810 --> 00:06:41,870 have much less to say today, is the process 122 00:06:41,870 --> 00:06:45,230 of dynamically linking procedures together at runtime. 123 00:06:45,230 --> 00:06:49,386 124 00:06:49,386 --> 00:06:52,190 I'd like to spend a few minutes now talking 125 00:06:52,190 --> 00:06:55,880 about generic functions and what they do for you, 126 00:06:55,880 --> 00:06:59,460 and why they are important in this process. 127 00:06:59,460 --> 00:07:04,760 In a symbolic computer, the operators in general 128 00:07:04,760 --> 00:07:11,130 are capable of handling many different types of data. 129 00:07:11,130 --> 00:07:15,320 The same program-- in this case, the simple addition operator 130 00:07:15,320 --> 00:07:16,670 plus-- 131 00:07:16,670 --> 00:07:19,610 is capable of operating on many different types of data. 132 00:07:19,610 --> 00:07:21,440 I've given some examples here. 133 00:07:21,440 --> 00:07:24,590 For example, we can add together two integers like 6 and 9, 134 00:07:24,590 --> 00:07:27,230 producing a result of 15. 135 00:07:27,230 --> 00:07:32,450 That same program, when given a pair of arguments such as 6 136 00:07:32,450 --> 00:07:36,350 and 9.1, where the second argument now 137 00:07:36,350 --> 00:07:39,200 is a floating point number, produces a floating point 138 00:07:39,200 --> 00:07:46,740 result. Similarly, the addition of two symbolic quantities-- 139 00:07:46,740 --> 00:07:49,430 these might, for example, represent vectors 140 00:07:49,430 --> 00:07:53,570 perhaps implemented in the form of quaternions. 141 00:07:53,570 --> 00:07:56,450 We could add together these two vectors 142 00:07:56,450 --> 00:07:59,240 and produce a vector result. 143 00:07:59,240 --> 00:08:04,790 The ability for the same program, for the same operator 144 00:08:04,790 --> 00:08:08,510 to act on many different types of data 145 00:08:08,510 --> 00:08:11,150 gives us tremendous leverage in the software development 146 00:08:11,150 --> 00:08:13,320 process. 147 00:08:13,320 --> 00:08:16,220 In other words, when it comes time 148 00:08:16,220 --> 00:08:17,990 to write a subroutine, a subroutine 149 00:08:17,990 --> 00:08:23,210 such as a square root subroutine, 150 00:08:23,210 --> 00:08:28,420 it is not necessary to make a decision ahead of time 151 00:08:28,420 --> 00:08:33,340 as to which type of arguments that subroutine receives. 152 00:08:33,340 --> 00:08:35,350 Once we have defined that subroutine 153 00:08:35,350 --> 00:08:38,240 in terms of the basic primitives of the language, 154 00:08:38,240 --> 00:08:42,530 it will work with any set of arguments. 155 00:08:42,530 --> 00:08:45,500 This ability to handle different types of arguments 156 00:08:45,500 --> 00:08:48,290 and to do something sensible with them dynamically 157 00:08:48,290 --> 00:08:52,070 at runtime is really one of the key components 158 00:08:52,070 --> 00:08:54,665 of symbolic architectures. 159 00:08:54,665 --> 00:08:59,790 160 00:08:59,790 --> 00:09:03,840 These generic operations check the types 161 00:09:03,840 --> 00:09:07,440 of the arguments coming into the operation, 162 00:09:07,440 --> 00:09:11,370 and check the result of the operation 163 00:09:11,370 --> 00:09:14,040 to make sure that the result of the operation 164 00:09:14,040 --> 00:09:17,680 can be expressed in the correct form. 165 00:09:17,680 --> 00:09:21,250 Typically, the process consists of several steps. 166 00:09:21,250 --> 00:09:25,620 The types of the input data are checked prior to the operation. 167 00:09:25,620 --> 00:09:28,260 Depending on how many input variables, 168 00:09:28,260 --> 00:09:31,350 you may require more or less checking. 169 00:09:31,350 --> 00:09:35,160 Data of an incompatible type-- for example, in the example 170 00:09:35,160 --> 00:09:37,290 we saw a minute ago, we had both an integer 171 00:09:37,290 --> 00:09:39,600 and a floating point number being added together-- 172 00:09:39,600 --> 00:09:43,410 data of incompatible types may be converted from one form 173 00:09:43,410 --> 00:09:44,500 to another. 174 00:09:44,500 --> 00:09:46,920 So in that add, for example, the integer 175 00:09:46,920 --> 00:09:50,160 is converted into floating point format. 176 00:09:50,160 --> 00:09:55,770 The operations will trap out and do that conversion process, 177 00:09:55,770 --> 00:09:58,530 or perhaps error out and allow the programmer 178 00:09:58,530 --> 00:10:01,170 to recover in some more manual fashion, 179 00:10:01,170 --> 00:10:04,380 or perhaps call a user defined subroutine 180 00:10:04,380 --> 00:10:07,290 for handling operations which are not otherwise handled 181 00:10:07,290 --> 00:10:09,810 by the hardware. 182 00:10:09,810 --> 00:10:14,610 And this operation of being able to trap out on the reference 183 00:10:14,610 --> 00:10:17,220 data is another key characteristic 184 00:10:17,220 --> 00:10:19,470 of these symbolic architectures. 185 00:10:19,470 --> 00:10:21,818 Finally, after the operation is performed, 186 00:10:21,818 --> 00:10:23,610 and in a lot of respects that's the easiest 187 00:10:23,610 --> 00:10:26,190 part of the process, finally, when the operation 188 00:10:26,190 --> 00:10:28,870 is performed, we check for exceptions. 189 00:10:28,870 --> 00:10:33,210 For example, floating point, overflow or underflow, 190 00:10:33,210 --> 00:10:34,860 integer overflow. 191 00:10:34,860 --> 00:10:41,530 And we produce a result which has a data type which 192 00:10:41,530 --> 00:10:47,080 is commensurate with the form of the result of that operation 193 00:10:47,080 --> 00:10:47,890 it's producing. 194 00:10:47,890 --> 00:10:50,050 As an example, if we add together 195 00:10:50,050 --> 00:10:53,650 two integers, which are almost at the precision 196 00:10:53,650 --> 00:10:56,620 of a 32-bit adder, we will produce an overflow 197 00:10:56,620 --> 00:10:57,910 in a normal machine. 198 00:10:57,910 --> 00:11:00,310 On a symbolic processor, that process 199 00:11:00,310 --> 00:11:07,570 produces a result which is stored in a different form, 200 00:11:07,570 --> 00:11:10,360 and produces a correct result, even 201 00:11:10,360 --> 00:11:13,420 though the representation of that result 202 00:11:13,420 --> 00:11:15,750 has to be done in a different way. 203 00:11:15,750 --> 00:11:18,730 204 00:11:18,730 --> 00:11:21,240 Next, I'd like to say a few words about the garbage 205 00:11:21,240 --> 00:11:24,016 collection process. 206 00:11:24,016 --> 00:11:27,590 The garbage collection is a term which 207 00:11:27,590 --> 00:11:29,720 is used in the symbolic computing 208 00:11:29,720 --> 00:11:34,220 community for the process of reclaiming storage which 209 00:11:34,220 --> 00:11:37,040 is no longer necessary, no longer necessary 210 00:11:37,040 --> 00:11:39,050 to the computation. 211 00:11:39,050 --> 00:11:41,960 Traditionally, the approach of conventional languages 212 00:11:41,960 --> 00:11:47,300 to this process has been to put the burden on the programmer 213 00:11:47,300 --> 00:11:52,560 to explicitly return data storage which 214 00:11:52,560 --> 00:11:55,680 is no longer being used by the program. 215 00:11:55,680 --> 00:12:00,630 This is one of the major sources of bugs in these programs. 216 00:12:00,630 --> 00:12:03,810 And any experienced programmer can tell you 217 00:12:03,810 --> 00:12:09,090 that the process of debugging programs which 218 00:12:09,090 --> 00:12:12,870 have incorrect storage allocation, and especially 219 00:12:12,870 --> 00:12:17,700 deallocation is a tremendously difficult problem. 220 00:12:17,700 --> 00:12:20,250 The book The Mythical Man-Month I 221 00:12:20,250 --> 00:12:23,070 believe points this area out as one 222 00:12:23,070 --> 00:12:30,210 of the major contributing factors to program bugs. 223 00:12:30,210 --> 00:12:35,230 In the symbolic languages, such as Lisp and PROLOG, 224 00:12:35,230 --> 00:12:42,030 this problem is avoided by eliminating the explicit return 225 00:12:42,030 --> 00:12:46,500 of data under programmer control and making 226 00:12:46,500 --> 00:12:51,150 the problem of reclaiming storage a process 227 00:12:51,150 --> 00:12:55,410 which the system and the language are responsible for, 228 00:12:55,410 --> 00:12:57,030 rather than the programmer. 229 00:12:57,030 --> 00:12:58,935 This eliminates an entire class of bugs. 230 00:12:58,935 --> 00:13:06,920 231 00:13:06,920 --> 00:13:09,680 Early garbage collection technology 232 00:13:09,680 --> 00:13:13,700 was a very expensive operation. 233 00:13:13,700 --> 00:13:19,820 Two techniques were developed in the '60s as techniques 234 00:13:19,820 --> 00:13:23,240 for solving the garbage collection problem. 235 00:13:23,240 --> 00:13:25,160 One of them was reference counting. 236 00:13:25,160 --> 00:13:29,690 Every time you constructed a new pointer to a particular object 237 00:13:29,690 --> 00:13:32,780 in memory, we would increment the counter, 238 00:13:32,780 --> 00:13:35,540 which kept track of how many locations were pointing 239 00:13:35,540 --> 00:13:38,780 to a particular data item. 240 00:13:38,780 --> 00:13:41,330 And correspondingly, every time a pointer 241 00:13:41,330 --> 00:13:44,900 was destroyed to a particular location, 242 00:13:44,900 --> 00:13:48,560 that counter was decremented. 243 00:13:48,560 --> 00:13:51,680 The process of reference counting 244 00:13:51,680 --> 00:13:56,630 reclaims objects automatically when that count reaches zero. 245 00:13:56,630 --> 00:13:59,960 When in other words, there are no further locations 246 00:13:59,960 --> 00:14:02,270 referencing a particular object. 247 00:14:02,270 --> 00:14:04,700 The difficulty with reference counting as a strategy 248 00:14:04,700 --> 00:14:07,650 is that it doesn't work on circular structures. 249 00:14:07,650 --> 00:14:10,580 So if we build a structure at any time 250 00:14:10,580 --> 00:14:15,500 during the process of running a program 251 00:14:15,500 --> 00:14:19,580 or building a large complex environment, 252 00:14:19,580 --> 00:14:23,330 then once that circular structure is constructed, 253 00:14:23,330 --> 00:14:25,640 then there is no way of reclaiming 254 00:14:25,640 --> 00:14:27,420 the storage associated with it. 255 00:14:27,420 --> 00:14:29,810 This is because if you think of, for example, 256 00:14:29,810 --> 00:14:33,230 a loop, a simply two-location loop, 257 00:14:33,230 --> 00:14:36,380 where one location points to another 258 00:14:36,380 --> 00:14:38,510 and that one points back to the first, 259 00:14:38,510 --> 00:14:41,960 then the reference counts for those two locations are both 1, 260 00:14:41,960 --> 00:14:45,790 and therefore neither of them can be reclaimed. 261 00:14:45,790 --> 00:14:49,070 A second approach was the so-called mark sweep 262 00:14:49,070 --> 00:14:50,210 collectors. 263 00:14:50,210 --> 00:14:53,540 In this style of garbage collector, 264 00:14:53,540 --> 00:14:58,730 the computation stops when the free storage runs out, 265 00:14:58,730 --> 00:15:04,730 and we go through what's called a marking phase, where 266 00:15:04,730 --> 00:15:08,390 we mark each location in memory, which is referenced 267 00:15:08,390 --> 00:15:15,830 by certain root nodes which the ongoing computation knows 268 00:15:15,830 --> 00:15:17,600 are important to it. 269 00:15:17,600 --> 00:15:23,270 That process of marking each of the locations in memory traces 270 00:15:23,270 --> 00:15:25,940 through all of the accessible storage in memory, 271 00:15:25,940 --> 00:15:29,150 and any location which then has not 272 00:15:29,150 --> 00:15:31,490 been marked as part of that process 273 00:15:31,490 --> 00:15:34,220 is free to be reclaimed. 274 00:15:34,220 --> 00:15:36,200 One of the difficulties with this process 275 00:15:36,200 --> 00:15:38,810 is that it does not reclaim-- 276 00:15:38,810 --> 00:15:41,990 does not compact to the memory during this reclamation 277 00:15:41,990 --> 00:15:42,860 process. 278 00:15:42,860 --> 00:15:45,140 So we are left with fragmented memory, 279 00:15:45,140 --> 00:15:48,690 with memory which has lots of holes in it. 280 00:15:48,690 --> 00:15:50,572 And this makes it hard after we've 281 00:15:50,572 --> 00:15:52,280 done several of these garbage collections 282 00:15:52,280 --> 00:15:54,790 to allocate linear structures such as arrays 283 00:15:54,790 --> 00:15:57,380 in an environment like this. 284 00:15:57,380 --> 00:15:59,930 There's also a very poor paging locality 285 00:15:59,930 --> 00:16:02,750 in a virtual memory environment using garbage collection 286 00:16:02,750 --> 00:16:03,830 techniques like this. 287 00:16:03,830 --> 00:16:05,960 Again, because the compaction process 288 00:16:05,960 --> 00:16:10,520 does not fill up pages with valuable information. 289 00:16:10,520 --> 00:16:14,750 Instead, whatever locality has been 290 00:16:14,750 --> 00:16:18,980 lost as a result of the allocation process 291 00:16:18,980 --> 00:16:22,595 is never reclaimed as part of this collection process. 292 00:16:22,595 --> 00:16:25,860 293 00:16:25,860 --> 00:16:29,480 The end result is that mark sweep collectors by and large 294 00:16:29,480 --> 00:16:33,890 are not used in modern symbolic languages. 295 00:16:33,890 --> 00:16:37,440 296 00:16:37,440 --> 00:16:43,420 The stop and copy collectors were the next major development 297 00:16:43,420 --> 00:16:45,760 in garbage collection. 298 00:16:45,760 --> 00:16:48,310 Minsky, Fenichel, and Yochelson are 299 00:16:48,310 --> 00:16:52,330 the people primarily responsible for this development. 300 00:16:52,330 --> 00:16:54,880 Again, for this style of garbage collector, 301 00:16:54,880 --> 00:16:58,960 the computation stops during the collection process. 302 00:16:58,960 --> 00:17:00,890 Unlike the mark and sweep collectors, 303 00:17:00,890 --> 00:17:05,800 however, this style of garbage collection compacts memory. 304 00:17:05,800 --> 00:17:08,980 We'll say a few more words about how this process is 305 00:17:08,980 --> 00:17:11,260 done in a minute. 306 00:17:11,260 --> 00:17:14,440 This fact that we are compacting memory 307 00:17:14,440 --> 00:17:19,150 leads, of course, to an easy job of allocating linear 308 00:17:19,150 --> 00:17:20,920 structures, such as large arrays, 309 00:17:20,920 --> 00:17:25,190 and much improved paging performance. 310 00:17:25,190 --> 00:17:28,870 I'd like to tell you how this garbage collection 311 00:17:28,870 --> 00:17:31,744 process works. 312 00:17:31,744 --> 00:17:35,890 The Minsky-Fenichel-Yochelson collector. 313 00:17:35,890 --> 00:17:41,650 To do this, I'd like to simulate the computer here for a minute. 314 00:17:41,650 --> 00:17:47,530 The process starts where the processor holds pointers 315 00:17:47,530 --> 00:17:50,770 into certain locations in memory. 316 00:17:50,770 --> 00:17:53,290 Each of the locations in memory in general 317 00:17:53,290 --> 00:17:57,520 will hold pointers to other locations in memory. 318 00:17:57,520 --> 00:18:01,390 And the process, this processor is free 319 00:18:01,390 --> 00:18:06,430 now to allocate new memory within this old space region. 320 00:18:06,430 --> 00:18:10,450 Eventually, the allocation process, the application 321 00:18:10,450 --> 00:18:14,020 of new memory by the processor will run out of space, 322 00:18:14,020 --> 00:18:17,740 and old space will be full. 323 00:18:17,740 --> 00:18:20,080 Now, unlike the mark sweep algorithms 324 00:18:20,080 --> 00:18:26,200 which we discussed a minute ago, the Minsky-Fenichel-Yochelson 325 00:18:26,200 --> 00:18:30,220 collector, copying garbage collectors, 326 00:18:30,220 --> 00:18:34,570 the process they employ is to stop the computation 327 00:18:34,570 --> 00:18:38,920 and to copy the valuable data items from this old space 328 00:18:38,920 --> 00:18:43,090 region into a new space region. 329 00:18:43,090 --> 00:18:46,300 The old space items are gradually 330 00:18:46,300 --> 00:18:50,470 moved one at a time from old space to new space. 331 00:18:50,470 --> 00:18:56,860 We start with the pointers that the processor holds, pointers 332 00:18:56,860 --> 00:18:58,460 into this old space width. 333 00:18:58,460 --> 00:19:00,430 So in other words, we start by taking 334 00:19:00,430 --> 00:19:02,800 each of the pointers the processor holds 335 00:19:02,800 --> 00:19:05,080 and copying the data item corresponding 336 00:19:05,080 --> 00:19:07,160 to that into new space. 337 00:19:07,160 --> 00:19:09,130 So for example, if we start with this pointer 338 00:19:09,130 --> 00:19:10,810 right here from the processor, we 339 00:19:10,810 --> 00:19:14,740 will copy this data item into new space. 340 00:19:14,740 --> 00:19:19,720 So new space now will contain a pointer to the same location 341 00:19:19,720 --> 00:19:22,660 that this pointer originally pointed to. 342 00:19:22,660 --> 00:19:24,970 And we will update this processor pointer 343 00:19:24,970 --> 00:19:28,790 to point down to this new location. 344 00:19:28,790 --> 00:19:30,940 So we will destroy that pointer. 345 00:19:30,940 --> 00:19:36,280 The processor now points to an object in new space. 346 00:19:36,280 --> 00:19:38,800 We'll continue that with the second pointer, 347 00:19:38,800 --> 00:19:42,280 and now allocate a new word in memory 348 00:19:42,280 --> 00:19:53,160 in new space for this location up here. 349 00:19:53,160 --> 00:19:57,632 And now destroy this pointer and update it 350 00:19:57,632 --> 00:19:58,965 to point to the second location. 351 00:19:58,965 --> 00:20:01,930 352 00:20:01,930 --> 00:20:07,000 Now, during this process we also leave 353 00:20:07,000 --> 00:20:10,900 behind a special kind of pointer. 354 00:20:10,900 --> 00:20:15,840 We use the data type field of each word 355 00:20:15,840 --> 00:20:19,860 to add to each of the old locations 356 00:20:19,860 --> 00:20:23,310 a special kind of pointer called a forwarding pointer, which 357 00:20:23,310 --> 00:20:27,300 tells where each of the locations that have been moved 358 00:20:27,300 --> 00:20:28,950 has been forwarded to. 359 00:20:28,950 --> 00:20:34,230 So in this case, this location will now be updated to a point 360 00:20:34,230 --> 00:20:35,650 here. 361 00:20:35,650 --> 00:20:38,730 And this location will be marked with a special pointer, 362 00:20:38,730 --> 00:20:42,780 forwarding pointer to point here. 363 00:20:42,780 --> 00:20:47,830 364 00:20:47,830 --> 00:20:53,490 So we have now moved from old space into new space 365 00:20:53,490 --> 00:20:57,420 the contents which are directly pointed to by the processor. 366 00:20:57,420 --> 00:21:00,100 367 00:21:00,100 --> 00:21:03,790 However, that's not all of the valuable information 368 00:21:03,790 --> 00:21:05,890 which is held in old space. 369 00:21:05,890 --> 00:21:08,710 We still have valuable information held in old space, 370 00:21:08,710 --> 00:21:13,300 namely all of the things which these objects now point to. 371 00:21:13,300 --> 00:21:17,110 372 00:21:17,110 --> 00:21:21,070 Now we begin a process which is called scavenging. 373 00:21:21,070 --> 00:21:26,770 In the scavenging process, we keep a pointer into new space. 374 00:21:26,770 --> 00:21:29,770 This new space data-- 375 00:21:29,770 --> 00:21:35,140 this new space pointer is used to locate 376 00:21:35,140 --> 00:21:37,900 all of the pointers which are in new space, which 377 00:21:37,900 --> 00:21:42,170 point to objects which are still in old space. 378 00:21:42,170 --> 00:21:45,740 So for example, when we first reference this location, 379 00:21:45,740 --> 00:21:49,780 we see that it contains a pointer into old space. 380 00:21:49,780 --> 00:21:53,800 This means we are obligated to move 381 00:21:53,800 --> 00:21:57,070 the contents of this location into new space. 382 00:21:57,070 --> 00:21:59,050 The way we do that is the same as we 383 00:21:59,050 --> 00:22:01,900 moved the original processor pointers. 384 00:22:01,900 --> 00:22:06,160 We allocate a new location in new space. 385 00:22:06,160 --> 00:22:10,930 We change the pointer in old space 386 00:22:10,930 --> 00:22:13,690 to point to this new location. 387 00:22:13,690 --> 00:22:15,310 And we update the pointer which we 388 00:22:15,310 --> 00:22:17,590 have encountered to that location 389 00:22:17,590 --> 00:22:21,500 to point to this new object. 390 00:22:21,500 --> 00:22:24,580 So we destroy this pointer. 391 00:22:24,580 --> 00:22:27,490 Gradually, then we move our scavenging pointer 392 00:22:27,490 --> 00:22:31,270 to the next location. 393 00:22:31,270 --> 00:22:34,360 And we start that process again. 394 00:22:34,360 --> 00:22:36,880 When the scavenging pointer runs out 395 00:22:36,880 --> 00:22:43,420 of data items in new space, which have yet to be examined, 396 00:22:43,420 --> 00:22:47,950 then the entire contents in old space is garbage. 397 00:22:47,950 --> 00:22:53,530 And we are then free to completely eliminate this data 398 00:22:53,530 --> 00:22:57,790 and perform what is sometimes called a flip, where 399 00:22:57,790 --> 00:23:01,940 the contents of new space becomes old space. 400 00:23:01,940 --> 00:23:08,230 And old space is reclaimed and is available for use 401 00:23:08,230 --> 00:23:12,340 in the next phase of garbage collection operations. 402 00:23:12,340 --> 00:23:16,450 403 00:23:16,450 --> 00:23:23,470 This technique of garbage collection to review 404 00:23:23,470 --> 00:23:27,970 is, however, still a stop and copy garbage collector. 405 00:23:27,970 --> 00:23:30,640 What this means from a user's standpoint 406 00:23:30,640 --> 00:23:35,260 is that this collector still has a serious problem in it. 407 00:23:35,260 --> 00:23:37,780 The serious problem being that computation 408 00:23:37,780 --> 00:23:42,310 stops for a long period of time while the garbage collection 409 00:23:42,310 --> 00:23:46,060 process is in action. 410 00:23:46,060 --> 00:23:49,630 For small programs, this is much less of a problem. 411 00:23:49,630 --> 00:23:53,380 If we have a program that's a megaword of memory, 412 00:23:53,380 --> 00:23:56,710 then the garbage collection process is quite rapid. 413 00:23:56,710 --> 00:24:00,430 For very large environments, however, the process 414 00:24:00,430 --> 00:24:04,007 can become quite excessive, and it becomes very annoying, 415 00:24:04,007 --> 00:24:05,590 particularly if you're trying to build 416 00:24:05,590 --> 00:24:07,900 an interactive application, it becomes 417 00:24:07,900 --> 00:24:11,770 quite annoying to have these pauses which 418 00:24:11,770 --> 00:24:16,840 can occur at any time, interrupt the interactive nature 419 00:24:16,840 --> 00:24:19,370 of your computation. 420 00:24:19,370 --> 00:24:24,130 So the fact that these collectors stop the world 421 00:24:24,130 --> 00:24:26,080 and perform this garbage collection 422 00:24:26,080 --> 00:24:28,450 has been a serious problem with garbage collection 423 00:24:28,450 --> 00:24:30,130 technologies. 424 00:24:30,130 --> 00:24:36,560 In about 1974, Henry Baker developed an algorithm, 425 00:24:36,560 --> 00:24:41,110 which is called the Baker collector, which 426 00:24:41,110 --> 00:24:45,580 solves this problem, and performs an operation known 427 00:24:45,580 --> 00:24:48,160 as incremental garbage collection. 428 00:24:48,160 --> 00:24:50,620 The Baker garbage collection does 429 00:24:50,620 --> 00:24:53,590 something very similar to the Minsky-Fenichel-Yochelson 430 00:24:53,590 --> 00:24:54,910 garbage collector. 431 00:24:54,910 --> 00:24:59,590 However, it does it while computation continues 432 00:24:59,590 --> 00:25:02,360 in the main processor. 433 00:25:02,360 --> 00:25:06,250 I'd like to tell you a little bit about how that computation, 434 00:25:06,250 --> 00:25:11,260 how that garbage collector works. 435 00:25:11,260 --> 00:25:18,050 In a similar way to the way that the Minsky-Fenichel-Yochelson 436 00:25:18,050 --> 00:25:22,700 collector works, we start with a set of pointers 437 00:25:22,700 --> 00:25:29,210 to data items in main memory in old space. 438 00:25:29,210 --> 00:25:33,950 The processor is computing along, referencing data, 439 00:25:33,950 --> 00:25:37,250 and building new structures in old space. 440 00:25:37,250 --> 00:25:47,970 Eventually, the space available in old space is used up. 441 00:25:47,970 --> 00:25:52,080 And we are required then to garbage collect that space 442 00:25:52,080 --> 00:25:54,240 and to allocate new space. 443 00:25:54,240 --> 00:25:56,550 Now, what does that process look like? 444 00:25:56,550 --> 00:26:00,120 The process of collecting now involves the same steps 445 00:26:00,120 --> 00:26:03,150 that we saw a moment ago in the Minsky-Fenichel-Yochelson 446 00:26:03,150 --> 00:26:05,310 collector. 447 00:26:05,310 --> 00:26:15,180 We first move the data objects which are held in old space. 448 00:26:15,180 --> 00:26:19,930 So we locate the pointers that the processor is holding. 449 00:26:19,930 --> 00:26:23,510 We move those locations to new space. 450 00:26:23,510 --> 00:26:26,030 So for example, in this particular case, 451 00:26:26,030 --> 00:26:30,260 we have a pointer from this location to this location. 452 00:26:30,260 --> 00:26:32,530 This relocation of this pointer will 453 00:26:32,530 --> 00:26:39,700 involve moving this location into new space 454 00:26:39,700 --> 00:26:43,390 and leaving behind, again, a forwarding 455 00:26:43,390 --> 00:26:45,770 pointer in the old location. 456 00:26:45,770 --> 00:26:48,970 So we will leave behind a forwarding location, forwarding 457 00:26:48,970 --> 00:26:54,550 pointer in this location, change this pointer to point here, 458 00:26:54,550 --> 00:26:58,180 and update this pointer to point in to new space. 459 00:26:58,180 --> 00:27:02,150 460 00:27:02,150 --> 00:27:07,230 And again, the process of scavenging is also necessary. 461 00:27:07,230 --> 00:27:09,580 We have to relocate these other pointers as part 462 00:27:09,580 --> 00:27:10,580 of this process as well. 463 00:27:10,580 --> 00:27:13,970 But I won't go through that right now. 464 00:27:13,970 --> 00:27:15,950 The difference between this collector 465 00:27:15,950 --> 00:27:18,020 and the Minsky-Fenichel-Yochelson 466 00:27:18,020 --> 00:27:24,770 collector is that the scavenging process continues 467 00:27:24,770 --> 00:27:28,670 in the background while we are performing computations 468 00:27:28,670 --> 00:27:30,140 in the processor. 469 00:27:30,140 --> 00:27:32,570 And the way we get away with that, the trick that 470 00:27:32,570 --> 00:27:39,080 allows us to do that in this particular style of collector 471 00:27:39,080 --> 00:27:43,050 is that the processor, every time it 472 00:27:43,050 --> 00:27:47,250 references a data item determines whether or not 473 00:27:47,250 --> 00:27:51,750 the data item that it holds is a pointer into new space. 474 00:27:51,750 --> 00:27:54,900 Those are the only pointers that the processor 475 00:27:54,900 --> 00:27:56,340 is allowed to have. 476 00:27:56,340 --> 00:28:03,270 The processor is only allowed to hold pointers into new space. 477 00:28:03,270 --> 00:28:05,910 Every time, in general, however, new space 478 00:28:05,910 --> 00:28:11,280 may contain pointers back into old space. 479 00:28:11,280 --> 00:28:14,760 The processor is required every time 480 00:28:14,760 --> 00:28:19,710 it performs a memory access to check 481 00:28:19,710 --> 00:28:23,070 to assure that the pointer which it is picking up-- 482 00:28:23,070 --> 00:28:25,890 for example, if it references through this pointer, 483 00:28:25,890 --> 00:28:29,940 picks up the contents of this data word, 484 00:28:29,940 --> 00:28:34,250 that data word contains a pointer to old space. 485 00:28:34,250 --> 00:28:37,220 We implement something called the barrier, 486 00:28:37,220 --> 00:28:42,890 which allows the processor to determine that it is picking up 487 00:28:42,890 --> 00:28:45,450 a pointer into old space. 488 00:28:45,450 --> 00:28:50,330 And instead of allowing computation to proceed, 489 00:28:50,330 --> 00:28:54,470 that process of loading a pointer into old space 490 00:28:54,470 --> 00:28:58,940 triggers a trap in the processor. 491 00:28:58,940 --> 00:29:01,880 Once that trap in the processor occurs, 492 00:29:01,880 --> 00:29:06,140 that's called a transport trap, once that transport trap occurs 493 00:29:06,140 --> 00:29:10,010 in the processor, we are obligated at that instant 494 00:29:10,010 --> 00:29:13,940 to move the data object, which is being pointed 495 00:29:13,940 --> 00:29:17,780 to by old space into new space to perform what's 496 00:29:17,780 --> 00:29:20,220 called a transport of that object from old space 497 00:29:20,220 --> 00:29:22,100 into new space. 498 00:29:22,100 --> 00:29:25,790 Once we've done that moving of the object from old space 499 00:29:25,790 --> 00:29:29,300 into new space, then of course the processor 500 00:29:29,300 --> 00:29:31,910 is free to have a pointer to it, because it's now a new space 501 00:29:31,910 --> 00:29:32,410 object. 502 00:29:32,410 --> 00:29:37,070 503 00:29:37,070 --> 00:29:41,480 This allows this process of trapping out on references 504 00:29:41,480 --> 00:29:44,810 into old space allows the computation 505 00:29:44,810 --> 00:29:48,710 to proceed at the same time the garbage collection is going on. 506 00:29:48,710 --> 00:29:50,660 It's a major breakthrough in our ability 507 00:29:50,660 --> 00:29:53,420 to build interactive environments with along 508 00:29:53,420 --> 00:29:55,580 with garbage collection. 509 00:29:55,580 --> 00:29:58,700 We still have to do this scavenging 510 00:29:58,700 --> 00:30:00,120 work in the background. 511 00:30:00,120 --> 00:30:02,150 There's a background process whose job 512 00:30:02,150 --> 00:30:05,120 it is to scan through each item in new space, 513 00:30:05,120 --> 00:30:09,680 moving objects found that are referenced in old space 514 00:30:09,680 --> 00:30:11,480 from old space to new space. 515 00:30:11,480 --> 00:30:14,030 And similarly to the Minsky-Fenichel-Yochelson 516 00:30:14,030 --> 00:30:17,990 collector, when this scavenging pointer 517 00:30:17,990 --> 00:30:22,400 reaches the end of allocated storage in new space, 518 00:30:22,400 --> 00:30:25,250 we are done with the collection process 519 00:30:25,250 --> 00:30:30,110 and we're allowed to reclaim old space entirely and throw it 520 00:30:30,110 --> 00:30:32,810 away since there's no good data remaining in it. 521 00:30:32,810 --> 00:30:37,250 522 00:30:37,250 --> 00:30:39,020 This style of collector was first 523 00:30:39,020 --> 00:30:46,260 implemented in the MIT Lisp machine in around 1974. 524 00:30:46,260 --> 00:30:51,110 But there is still significant paging penalties associated 525 00:30:51,110 --> 00:30:53,150 with this garbage collector. 526 00:30:53,150 --> 00:30:56,840 And as we go and try to build larger and larger programming 527 00:30:56,840 --> 00:31:00,290 environments, it becomes more important to make this garbage 528 00:31:00,290 --> 00:31:04,640 collection process more and more efficient. 529 00:31:04,640 --> 00:31:09,410 The next real stage of garbage collection development 530 00:31:09,410 --> 00:31:13,880 occurred in about 1980, with the development 531 00:31:13,880 --> 00:31:16,820 of what's called the ephemeral garbage collector. 532 00:31:16,820 --> 00:31:19,070 And the ephemeral garbage collector, 533 00:31:19,070 --> 00:31:23,390 we rely upon the fact that the lifetime of newly allocated 534 00:31:23,390 --> 00:31:28,460 objects is substantially smaller than the lifetime of objects 535 00:31:28,460 --> 00:31:32,670 which have already been in existence for a period of time. 536 00:31:32,670 --> 00:31:35,090 In other words, newly allocated objects 537 00:31:35,090 --> 00:31:39,140 tend to be allocated, used for a brief period of time, 538 00:31:39,140 --> 00:31:41,130 and then thrown away. 539 00:31:41,130 --> 00:31:42,980 We can take advantage of the fact 540 00:31:42,980 --> 00:31:48,500 that these objects are ephemeral by making the garbage 541 00:31:48,500 --> 00:31:52,250 collection process for these newly allocated objects occur 542 00:31:52,250 --> 00:31:55,580 more rapidly and with a faster turnaround 543 00:31:55,580 --> 00:32:01,700 time than would occur for a long-lived object, 544 00:32:01,700 --> 00:32:03,530 a so-called dynamic object. 545 00:32:03,530 --> 00:32:10,180 546 00:32:10,180 --> 00:32:12,940 To get this idea across, we can, again, 547 00:32:12,940 --> 00:32:18,370 think of memory as consisting of a number of different regions. 548 00:32:18,370 --> 00:32:20,620 There might be a region of memory, 549 00:32:20,620 --> 00:32:23,650 so-called static memory, which we 550 00:32:23,650 --> 00:32:25,930 know we are never going to have to worry 551 00:32:25,930 --> 00:32:27,430 about garbage collecting. 552 00:32:27,430 --> 00:32:29,500 These are objects which are part of, perhaps, 553 00:32:29,500 --> 00:32:32,440 the operating system, which we know we are never 554 00:32:32,440 --> 00:32:35,330 going to have to reclaim. 555 00:32:35,330 --> 00:32:38,060 Might be a class of objects which, 556 00:32:38,060 --> 00:32:43,280 a smaller class in general, which is dynamic memory. 557 00:32:43,280 --> 00:32:45,950 Dynamic memory is memory which might typically 558 00:32:45,950 --> 00:32:50,150 be reclaimed as by use of a garbage 559 00:32:50,150 --> 00:32:53,060 collector such as the Baker collector 560 00:32:53,060 --> 00:32:55,500 that we talked about a moment ago. 561 00:32:55,500 --> 00:33:00,200 And finally, the brand new allocated objects 562 00:33:00,200 --> 00:33:05,930 that might be created as part of a short program segment which 563 00:33:05,930 --> 00:33:08,360 had just been executed might be stored 564 00:33:08,360 --> 00:33:10,880 in this ephemeral region. 565 00:33:10,880 --> 00:33:12,560 What we would like to be able to do 566 00:33:12,560 --> 00:33:15,890 is to garbage collect this ephemeral region independently 567 00:33:15,890 --> 00:33:18,590 from the remainder of memory. 568 00:33:18,590 --> 00:33:22,100 And the way we do this is by keeping 569 00:33:22,100 --> 00:33:25,340 careful track of all of the pointers 570 00:33:25,340 --> 00:33:27,020 into the ephemeral region. 571 00:33:27,020 --> 00:33:30,760 572 00:33:30,760 --> 00:33:33,960 The only data within the ephemeral region 573 00:33:33,960 --> 00:33:37,170 which is valuable after all is data which 574 00:33:37,170 --> 00:33:39,450 can be reached by a pointer. 575 00:33:39,450 --> 00:33:41,740 Those pointers come from two places. 576 00:33:41,740 --> 00:33:44,850 One of the places is processor registers, 577 00:33:44,850 --> 00:33:48,450 which typically hold the stack, which typically holds pointers 578 00:33:48,450 --> 00:33:51,160 to ephemeral objects such as this. 579 00:33:51,160 --> 00:33:54,840 The other major component of pointers 580 00:33:54,840 --> 00:33:58,890 into this ephemeral region is side effects 581 00:33:58,890 --> 00:34:01,030 on existing data structures. 582 00:34:01,030 --> 00:34:05,340 So we might, for example, have pointers from static memory 583 00:34:05,340 --> 00:34:07,260 into the ephemeral region. 584 00:34:07,260 --> 00:34:09,540 Or we might have pointers from dynamic memory 585 00:34:09,540 --> 00:34:12,370 into the ephemeral region. 586 00:34:12,370 --> 00:34:14,100 We need to be able to, if we're going 587 00:34:14,100 --> 00:34:17,909 to perform garbage collection only of this ephemeral region, 588 00:34:17,909 --> 00:34:22,380 it's important to be able to locate these pointers rapidly. 589 00:34:22,380 --> 00:34:27,840 And hardware that assists us in rapidly identifying 590 00:34:27,840 --> 00:34:30,300 the location of these pointers is 591 00:34:30,300 --> 00:34:34,710 really critical to implementing these ephemeral styles 592 00:34:34,710 --> 00:34:37,630 of garbage collectors efficiently. 593 00:34:37,630 --> 00:34:41,070 We'll talk in a minute about what that hardware looks like. 594 00:34:41,070 --> 00:34:46,520 595 00:34:46,520 --> 00:34:49,820 Finally, having talked about generic operations 596 00:34:49,820 --> 00:34:57,410 and operations such as operations involving storage 597 00:34:57,410 --> 00:35:00,440 allocation, I'd like to spend a few minutes talking 598 00:35:00,440 --> 00:35:07,520 about procedure calls, which are another major issue in terms 599 00:35:07,520 --> 00:35:09,680 of high performance symbolic computing. 600 00:35:09,680 --> 00:35:12,860 In a conventional machine, we typically 601 00:35:12,860 --> 00:35:17,660 have a fixed number of arguments and we know at compile time 602 00:35:17,660 --> 00:35:19,850 exactly what style of arguments we 603 00:35:19,850 --> 00:35:24,680 are using in each of the function 604 00:35:24,680 --> 00:35:30,860 calls, each of the procedure calls that we implement. 605 00:35:30,860 --> 00:35:35,360 In the symbolic environment, typically the argument calls 606 00:35:35,360 --> 00:35:36,750 are much more complex. 607 00:35:36,750 --> 00:35:38,150 We use keyword argument calls. 608 00:35:38,150 --> 00:35:40,070 We have variable numbers of procedures. 609 00:35:40,070 --> 00:35:45,680 And in general, we don't know what kind of arguments 610 00:35:45,680 --> 00:35:49,640 are going to be applied to a given function at compile time. 611 00:35:49,640 --> 00:35:51,890 Typically, this is because we are incrementally 612 00:35:51,890 --> 00:35:53,030 compiling programs. 613 00:35:53,030 --> 00:35:55,880 We don't have the whole program available to us 614 00:35:55,880 --> 00:35:57,990 at compile time. 615 00:35:57,990 --> 00:36:00,920 And so we're going to try to do compiling 616 00:36:00,920 --> 00:36:03,560 of this program in small pieces. 617 00:36:03,560 --> 00:36:06,980 Object-oriented programming makes this problem even more 618 00:36:06,980 --> 00:36:17,370 difficult. 619 00:36:17,370 --> 00:36:24,050 Now, in symbolic computer's memory 620 00:36:24,050 --> 00:36:29,450 contains not a raw collection of bits. 621 00:36:29,450 --> 00:36:33,260 We instead store objects in memory, objects which 622 00:36:33,260 --> 00:36:36,170 have some semantic meaning. 623 00:36:36,170 --> 00:36:39,230 And we tag those objects so we can identify what kind 624 00:36:39,230 --> 00:36:44,570 of object they are by adding to each location in memory 625 00:36:44,570 --> 00:36:48,020 a tag field. 626 00:36:48,020 --> 00:36:51,830 The tag, the tag identifier determines what kind of object 627 00:36:51,830 --> 00:36:52,865 that is. 628 00:36:52,865 --> 00:36:55,650 It might be pointers to other objects. 629 00:36:55,650 --> 00:36:57,380 That's certainly the most common kind 630 00:36:57,380 --> 00:36:58,940 of object in a language like Lisp. 631 00:36:58,940 --> 00:37:00,830 It might be integers. 632 00:37:00,830 --> 00:37:03,035 They might be floating point numbers. 633 00:37:03,035 --> 00:37:05,540 It might be an array. 634 00:37:05,540 --> 00:37:08,450 Sometimes more complex objects are also useful. 635 00:37:08,450 --> 00:37:10,910 We saw a moment ago that we needed 636 00:37:10,910 --> 00:37:13,590 special kinds of pointers for garbage collection, 637 00:37:13,590 --> 00:37:19,490 such as the forwarding pointers used in the Baker collector. 638 00:37:19,490 --> 00:37:23,480 Headers, which help us identify multi-word objects and arrays 639 00:37:23,480 --> 00:37:26,220 is also very useful. 640 00:37:26,220 --> 00:37:28,730 I'd like to turn now to what kind of hardware 641 00:37:28,730 --> 00:37:33,110 we use to support these features in the programming language. 642 00:37:33,110 --> 00:37:35,660 And this discussion will center primarily 643 00:37:35,660 --> 00:37:40,220 in the area of support for generic function calls 644 00:37:40,220 --> 00:37:44,300 for the garbage collector, and talk 645 00:37:44,300 --> 00:37:48,230 about how we interface a machine like this with these tag 646 00:37:48,230 --> 00:37:50,660 locations to the outside world. 647 00:37:50,660 --> 00:37:52,580 The major point I want to make is that there 648 00:37:52,580 --> 00:37:54,170 isn't a lot of hardware here. 649 00:37:54,170 --> 00:37:57,500 We're not talking about doubling the size of the processor. 650 00:37:57,500 --> 00:38:02,030 We're talking about adding 5% of the logic in the machine 651 00:38:02,030 --> 00:38:03,050 to the processor. 652 00:38:03,050 --> 00:38:07,400 653 00:38:07,400 --> 00:38:16,280 The process of tag support in these machines 654 00:38:16,280 --> 00:38:21,230 is handled by adding to the basic arithmetic unit 655 00:38:21,230 --> 00:38:25,430 a tag lookup table and lookup table which 656 00:38:25,430 --> 00:38:30,410 is based on the high order bits of the pointer address coming 657 00:38:30,410 --> 00:38:33,380 as an operand into the arithmetic unit. 658 00:38:33,380 --> 00:38:39,020 The tag lookup table tells us that the particular data 659 00:38:39,020 --> 00:38:42,080 item that we are dealing with coming into the arithmetic unit 660 00:38:42,080 --> 00:38:45,200 is of a particular type, whether it is a fixed point 661 00:38:45,200 --> 00:38:50,570 number, a floating point number, whether it is a pointer, 662 00:38:50,570 --> 00:38:54,020 whether it is an array. 663 00:38:54,020 --> 00:39:00,200 The garbage collector region number tells us whether or not 664 00:39:00,200 --> 00:39:05,240 the pointer, which is coming into the arithmetic unit, 665 00:39:05,240 --> 00:39:11,060 is a pointer into old space, into new space, 666 00:39:11,060 --> 00:39:12,680 or whether it is a pointer into one 667 00:39:12,680 --> 00:39:17,100 of the ephemeral regions of memory, 668 00:39:17,100 --> 00:39:21,170 such as we discussed a moment ago. 669 00:39:21,170 --> 00:39:23,600 The amount of logic that's necessary here 670 00:39:23,600 --> 00:39:25,880 really is very minor in comparison 671 00:39:25,880 --> 00:39:30,650 to the amount of logic which is required in implementing 672 00:39:30,650 --> 00:39:32,910 sophisticated arithmetic units. 673 00:39:32,910 --> 00:39:35,330 The tag insertion is quite trivial. 674 00:39:35,330 --> 00:39:39,710 And the total penalty for adding this logic is very small. 675 00:39:39,710 --> 00:39:41,750 One of the places, one of the major places 676 00:39:41,750 --> 00:39:44,930 this penalty arises, however, is in the main memory system, 677 00:39:44,930 --> 00:39:47,660 where we must store these tag bits. 678 00:39:47,660 --> 00:39:51,200 However, even there, the tag bits in a typical machine 679 00:39:51,200 --> 00:39:55,160 tend to be in the 4 to 8-bit region, which I'd point out 680 00:39:55,160 --> 00:39:58,080 is smaller than the amount of memory 681 00:39:58,080 --> 00:40:01,250 which is allocated for storing error correction bits, 682 00:40:01,250 --> 00:40:02,450 for example. 683 00:40:02,450 --> 00:40:05,360 And people don't seem to have any problem 684 00:40:05,360 --> 00:40:12,690 with using error correction as a technique in their processors. 685 00:40:12,690 --> 00:40:15,240 Well, how much does all of this help? 686 00:40:15,240 --> 00:40:17,700 Some people have said that tag logic really 687 00:40:17,700 --> 00:40:19,650 isn't necessary in these machines, 688 00:40:19,650 --> 00:40:22,110 and that we have the ability to do 689 00:40:22,110 --> 00:40:25,200 all of this using conventional machines checking 690 00:40:25,200 --> 00:40:27,420 the tags manually. 691 00:40:27,420 --> 00:40:29,220 I would disagree with that. 692 00:40:29,220 --> 00:40:32,340 Since every memory location, since every read and write 693 00:40:32,340 --> 00:40:36,630 has to be checked manually in the conventional kinds 694 00:40:36,630 --> 00:40:40,440 of architectures, there's a tremendous penalty to be paid. 695 00:40:40,440 --> 00:40:42,450 Every operation must be checked. 696 00:40:42,450 --> 00:40:45,810 We must check to see whether the types of the operands 697 00:40:45,810 --> 00:40:49,230 are appropriate to the result. Bottom line 698 00:40:49,230 --> 00:40:52,200 is that there's a very high penalty to doing this 699 00:40:52,200 --> 00:40:54,157 on a conventional architecture. 700 00:40:54,157 --> 00:40:58,728 701 00:40:58,728 --> 00:41:03,750 To turn now to not the kinds of architectures which 702 00:41:03,750 --> 00:41:08,730 we are building right now, but the kind we want to build, 703 00:41:08,730 --> 00:41:14,880 there is also a strong incentive for adding tag logic 704 00:41:14,880 --> 00:41:18,840 to our processors in terms of the future design 705 00:41:18,840 --> 00:41:23,430 of architectures for symbolic computation. 706 00:41:23,430 --> 00:41:27,300 The development of a language called MultiLisp at MIT 707 00:41:27,300 --> 00:41:31,560 by Burt Halstead, and QLisp by John McCarthy and company 708 00:41:31,560 --> 00:41:36,870 at Stanford involves a concept called futures. 709 00:41:36,870 --> 00:41:39,360 Futures allow us to express parallelism 710 00:41:39,360 --> 00:41:44,400 in a language such as Lisp by returning 711 00:41:44,400 --> 00:41:47,290 a placeholder in place of a result, 712 00:41:47,290 --> 00:41:49,860 when a function is called. 713 00:41:49,860 --> 00:41:54,960 And then we return this placeholder immediately, 714 00:41:54,960 --> 00:41:58,350 and then spawn a separate task whose job 715 00:41:58,350 --> 00:42:01,080 it is to actually compute the value, which is going to go 716 00:42:01,080 --> 00:42:04,080 into that placeholder location. 717 00:42:04,080 --> 00:42:05,970 If we encounter such a placeholder 718 00:42:05,970 --> 00:42:08,430 during the process of computing, we're 719 00:42:08,430 --> 00:42:12,990 going to wait until that placeholder has been filled in 720 00:42:12,990 --> 00:42:17,115 by the spawned task before we can proceed. 721 00:42:17,115 --> 00:42:17,865 Here's an example. 722 00:42:17,865 --> 00:42:20,820 723 00:42:20,820 --> 00:42:22,920 We have a program which is constructing 724 00:42:22,920 --> 00:42:28,920 a list of three elements, one of which is simply a number, 725 00:42:28,920 --> 00:42:32,040 and two of which are surrounded by this special construct 726 00:42:32,040 --> 00:42:33,540 called a future. 727 00:42:33,540 --> 00:42:37,470 Then the computations within the range of that future 728 00:42:37,470 --> 00:42:44,290 are spawned as separate tasks, separate computational tasks. 729 00:42:44,290 --> 00:42:46,620 However, we return and can actually 730 00:42:46,620 --> 00:42:50,400 build the list structure that specified 731 00:42:50,400 --> 00:42:54,330 by this list prior to these computations completing. 732 00:42:54,330 --> 00:42:58,020 This allows us to express in a very natural way 733 00:42:58,020 --> 00:43:04,290 the process of implementing a parallel computation. 734 00:43:04,290 --> 00:43:07,110 And a parallel computation in a much more natural way than we 735 00:43:07,110 --> 00:43:09,300 would otherwise be able to do. 736 00:43:09,300 --> 00:43:14,370 And the point I want to make is that every time we 737 00:43:14,370 --> 00:43:16,770 reference a piece of list structure 738 00:43:16,770 --> 00:43:19,530 as part of this process, we are required 739 00:43:19,530 --> 00:43:22,020 to check to see whether the particular data 740 00:43:22,020 --> 00:43:25,530 type that's being referenced is one of these futures. 741 00:43:25,530 --> 00:43:27,870 If it is a future, we must, we are 742 00:43:27,870 --> 00:43:32,100 obligated to wait for this computation to complete. 743 00:43:32,100 --> 00:43:34,860 The only way we can effectively and efficiently 744 00:43:34,860 --> 00:43:38,850 check to see whether this particular result is a future 745 00:43:38,850 --> 00:43:43,230 is by using tag control logic in the central processor. 746 00:43:43,230 --> 00:43:48,030 So to the extent that we believe in and follow up 747 00:43:48,030 --> 00:43:51,240 these ideas for parallel computation, 748 00:43:51,240 --> 00:43:56,790 we are also led directly in the path of tag logic 749 00:43:56,790 --> 00:44:01,780 as an important element of processor design. 750 00:44:01,780 --> 00:44:05,130 Well, in summary, I'd like to say 751 00:44:05,130 --> 00:44:11,370 that small amounts of logic in the processor, notably 752 00:44:11,370 --> 00:44:14,700 small amounts of logic to assist us 753 00:44:14,700 --> 00:44:17,280 with garbage collection, small amounts of logic 754 00:44:17,280 --> 00:44:23,010 to assist us in the process of checking and inserting 755 00:44:23,010 --> 00:44:24,270 tags in the process-- 756 00:44:24,270 --> 00:44:26,640 in the data paths of the processor-- 757 00:44:26,640 --> 00:44:29,220 make a large difference in the performance 758 00:44:29,220 --> 00:44:31,680 for this class of machines. 759 00:44:31,680 --> 00:44:34,800 Not only for implementing generic operations 760 00:44:34,800 --> 00:44:38,460 and solving the problem of graceful overflow 761 00:44:38,460 --> 00:44:43,050 of fixed-length data fields, but also 762 00:44:43,050 --> 00:44:46,530 in solving problems in the garbage collector area, 763 00:44:46,530 --> 00:44:48,630 and implementing some constructs which 764 00:44:48,630 --> 00:44:51,270 are going to be important to us in terms 765 00:44:51,270 --> 00:44:56,680 of the future evolution of Lisp as a language. 766 00:44:56,680 --> 00:44:57,180 Thank you. 767 00:44:57,180 --> 00:45:02,970 768 00:45:02,970 --> 00:45:06,210 I'm Judith Lemon from University Video Communications, 769 00:45:06,210 --> 00:45:09,000 and our first question from faculty 770 00:45:09,000 --> 00:45:12,870 is from the University of California Berkeley. 771 00:45:12,870 --> 00:45:16,770 Do you think risk ideas and workstations are a fad? 772 00:45:16,770 --> 00:45:17,880 Definitely not. 773 00:45:17,880 --> 00:45:22,060 I think risk ideas are here to stay. 774 00:45:22,060 --> 00:45:25,290 I would say, however, that the basic risk 775 00:45:25,290 --> 00:45:27,660 ideas which people have usually thought of in terms 776 00:45:27,660 --> 00:45:30,990 of conventional computing sorts of problems 777 00:45:30,990 --> 00:45:34,350 really have to be augmented by some type control logic 778 00:45:34,350 --> 00:45:36,930 and some special purpose hardware for garbage collection 779 00:45:36,930 --> 00:45:39,720 assist, before I think they can effectively 780 00:45:39,720 --> 00:45:45,370 address the applications which we are discussing today. 781 00:45:45,370 --> 00:45:47,170 From Purdue University. 782 00:45:47,170 --> 00:45:49,420 How do environments such as the Lucid environment 783 00:45:49,420 --> 00:45:52,770 compared with those available on symbolic architectures? 784 00:45:52,770 --> 00:45:56,810 Well, the performance that you obtain in environments 785 00:45:56,810 --> 00:45:59,590 such as the Lucid compiler environment 786 00:45:59,590 --> 00:46:03,010 are largely available to the programmer 787 00:46:03,010 --> 00:46:05,560 by use of special purpose declarations, which 788 00:46:05,560 --> 00:46:07,600 are inserted into the code. 789 00:46:07,600 --> 00:46:09,670 These declarations allow the compiler 790 00:46:09,670 --> 00:46:13,330 to produce instructions, which are tuned 791 00:46:13,330 --> 00:46:16,840 to the types of objects which are being dealt 792 00:46:16,840 --> 00:46:20,290 with in the program, and really restrict 793 00:46:20,290 --> 00:46:23,740 the flexibility of the code, and remove much of the advantage 794 00:46:23,740 --> 00:46:26,350 that the programmer sees, as opposed 795 00:46:26,350 --> 00:46:27,980 to the execution environment. 796 00:46:27,980 --> 00:46:31,280 So you end up with efficient execution. 797 00:46:31,280 --> 00:46:33,370 However, the penalty for that is that the program 798 00:46:33,370 --> 00:46:36,070 becomes far less flexible. 799 00:46:36,070 --> 00:46:38,860 From the University of California Berkeley, again. 800 00:46:38,860 --> 00:46:40,840 How much faster does the cycle time 801 00:46:40,840 --> 00:46:42,310 of the straightfoward machine have 802 00:46:42,310 --> 00:46:45,490 to be to perform like a Lisp machine? 803 00:46:45,490 --> 00:46:47,320 That's a complicated question. 804 00:46:47,320 --> 00:46:53,410 I believe, at the moment for single computer 805 00:46:53,410 --> 00:46:56,200 implementations, that in many cases 806 00:46:56,200 --> 00:47:01,370 the compiler can eliminate the type checks at compile time. 807 00:47:01,370 --> 00:47:05,350 However, I think as we move towards a parallel sorts 808 00:47:05,350 --> 00:47:10,390 of implementations, that the use of techniques such as futures, 809 00:47:10,390 --> 00:47:13,600 and certainly the use of incremental garbage collectors 810 00:47:13,600 --> 00:47:18,100 require that virtually every memory reference be tagged. 811 00:47:18,100 --> 00:47:20,530 As a result of that, I would anticipate 812 00:47:20,530 --> 00:47:24,520 that the overhead associated with reading and writing 813 00:47:24,520 --> 00:47:28,060 locations from main memory could be quite substantial, 814 00:47:28,060 --> 00:47:34,290 leading perhaps to factors of 2 or 3 in terms of performance. 815 00:47:34,290 --> 00:47:38,730 Tom, can you comment on Symbolics' PROLOG compiler? 816 00:47:38,730 --> 00:47:42,840 The implementation of PROLOG on Symbolic machines 817 00:47:42,840 --> 00:47:45,450 really is very similar to the implementation 818 00:47:45,450 --> 00:47:50,580 of other kinds of Symbolic languages such as Lisp. 819 00:47:50,580 --> 00:47:53,850 The effective implementation of PROLOG 820 00:47:53,850 --> 00:47:58,770 does involve additions of a few different types of data, 821 00:47:58,770 --> 00:48:02,350 notably a particular data type called a logic variable, 822 00:48:02,350 --> 00:48:06,120 which we use for binding identifiers 823 00:48:06,120 --> 00:48:08,730 during the pattern matching process. 824 00:48:08,730 --> 00:48:12,780 That, together with fast and careful attention 825 00:48:12,780 --> 00:48:17,700 to the unify within the language use very effective performance. 826 00:48:17,700 --> 00:48:21,690 One of the things that people perhaps are not aware of 827 00:48:21,690 --> 00:48:25,260 is that PROLOG requires many of the same features 828 00:48:25,260 --> 00:48:27,930 that Lisp does in terms of garbage collection 829 00:48:27,930 --> 00:48:32,760 and the other generic sorts of operation handling mechanisms 830 00:48:32,760 --> 00:48:35,360 that you find in Lisp. 831 00:48:35,360 --> 00:48:37,860 How do the architectures you've discussed address programmer 832 00:48:37,860 --> 00:48:38,805 productivity problems? 833 00:48:38,805 --> 00:48:41,550 834 00:48:41,550 --> 00:48:44,900 I think that's best answered by giving an example. 835 00:48:44,900 --> 00:48:48,740 Our last architectural effort at Symbolics 836 00:48:48,740 --> 00:48:51,920 is a custom microprocessor, which 837 00:48:51,920 --> 00:48:55,070 was developed by a very small group of people. 838 00:48:55,070 --> 00:48:59,390 It was developed with approximately a dozen people, 839 00:48:59,390 --> 00:49:03,830 half of whom were devoted to building the CAD system which 840 00:49:03,830 --> 00:49:06,620 allowed us to design that processor, and half of them 841 00:49:06,620 --> 00:49:10,130 were devoted to actually implementing the architecture. 842 00:49:10,130 --> 00:49:15,230 That process took about six man years of effort, 843 00:49:15,230 --> 00:49:19,310 and is a semiconductor chip of approximately 844 00:49:19,310 --> 00:49:24,260 the same complexity as a 386 or 68020 processor, which 845 00:49:24,260 --> 00:49:28,670 you might find in a large number of commercial workstations. 846 00:49:28,670 --> 00:49:32,120 The estimates by Intel and Motorola for those processors 847 00:49:32,120 --> 00:49:34,580 were in the several hundred man years of effort, 848 00:49:34,580 --> 00:49:38,810 and in most cases did not involve simultaneously 849 00:49:38,810 --> 00:49:41,040 building a CAD environment. 850 00:49:41,040 --> 00:49:45,170 So in that particular case, we are seeing huge productivity 851 00:49:45,170 --> 00:49:49,730 advances over the industry norm in some applications 852 00:49:49,730 --> 00:49:51,800 such as CAD. 853 00:49:51,800 --> 00:49:54,230 Very interesting. 854 00:49:54,230 --> 00:49:58,970 And a final question, and this is a little bit less technical. 855 00:49:58,970 --> 00:50:02,930 What do you see as the problems a student in the university 856 00:50:02,930 --> 00:50:07,400 faces today in terms of learning good system design? 857 00:50:07,400 --> 00:50:10,100 I think it's a really serious problem today. 858 00:50:10,100 --> 00:50:16,580 We are developing much more complex systems as we go on, 859 00:50:16,580 --> 00:50:19,250 and these systems become harder and harder 860 00:50:19,250 --> 00:50:22,850 to implement in their entirety within the university 861 00:50:22,850 --> 00:50:23,940 environment. 862 00:50:23,940 --> 00:50:26,900 This means that the students within the university 863 00:50:26,900 --> 00:50:29,600 environment usually see small portions of systems 864 00:50:29,600 --> 00:50:31,730 rather than systems as a whole. 865 00:50:31,730 --> 00:50:35,690 And this is, I think, exacerbated by the fact 866 00:50:35,690 --> 00:50:40,670 that the cost of implementing some of these large systems 867 00:50:40,670 --> 00:50:43,040 is also growing rather rapidly. 868 00:50:43,040 --> 00:50:46,100 It simply costs more to implement fast systems than it 869 00:50:46,100 --> 00:50:48,660 does to implement slow systems. 870 00:50:48,660 --> 00:50:54,980 So as we move away from the 1970s technologies such as wire 871 00:50:54,980 --> 00:50:59,120 app and low overhead technology such as that, 872 00:50:59,120 --> 00:51:01,520 and are required to move into an environment 873 00:51:01,520 --> 00:51:04,280 where printed circuit boards, and in many cases custom 874 00:51:04,280 --> 00:51:06,740 integrated circuits are necessary, 875 00:51:06,740 --> 00:51:10,070 it becomes very difficult to teach students 876 00:51:10,070 --> 00:51:12,800 about system level sorts of problems in the university 877 00:51:12,800 --> 00:51:13,670 environment. 878 00:51:13,670 --> 00:51:15,860 I'm quite concerned. 879 00:51:15,860 --> 00:51:18,170 There is even a problem in doing that effectively 880 00:51:18,170 --> 00:51:21,920 in the industrial setting, because typically 881 00:51:21,920 --> 00:51:25,610 in the industrial setting, architectures and system level 882 00:51:25,610 --> 00:51:28,340 design is controlled by relatively small groups. 883 00:51:28,340 --> 00:51:32,150 And it's very difficult for a newcomer into an organization 884 00:51:32,150 --> 00:51:34,730 to break into those groups. 885 00:51:34,730 --> 00:51:41,610 Do you see any directions to go in terms of solutions? 886 00:51:41,610 --> 00:51:43,640 Well, I think if we are going to become 887 00:51:43,640 --> 00:51:47,930 other than the civil engineering sorts of departments 888 00:51:47,930 --> 00:51:51,200 in the universities, that is a study about dams 889 00:51:51,200 --> 00:51:54,560 rather than building them, I think that we really 890 00:51:54,560 --> 00:51:59,480 have to redouble our efforts in terms of limiting what 891 00:51:59,480 --> 00:52:02,390 it is that we're trying to do within the university 892 00:52:02,390 --> 00:52:04,490 environment in terms of the complexity. 893 00:52:04,490 --> 00:52:07,040 But still being able to address the problem of building 894 00:52:07,040 --> 00:52:08,420 complete systems. 895 00:52:08,420 --> 00:52:12,080 Because I think only in building a complete system and system 896 00:52:12,080 --> 00:52:17,810 environments are we going to get to the kinds 897 00:52:17,810 --> 00:52:20,990 of educational experience that I would like to see 898 00:52:20,990 --> 00:52:23,220 in a university environment. 899 00:52:23,220 --> 00:52:23,720 Thank you. 900 00:52:23,720 --> 00:52:24,870 Thank you very much, Tom. 901 00:52:24,870 --> 00:52:26,090 Thank you. 902 00:52:26,090 --> 00:52:29,580 And thank you for joining us today. 903 00:52:29,580 --> 00:52:32,780 We invite your comments and questions on this presentation. 904 00:52:32,780 --> 00:52:34,550 And if you're viewing this on tape, 905 00:52:34,550 --> 00:52:37,830 we've provided some postcards tucked under the video cassette 906 00:52:37,830 --> 00:52:38,330 label. 907 00:52:38,330 --> 00:52:40,920 Otherwise, our address is at the end of the tape. 908 00:52:40,920 --> 00:52:42,760 Thank you again. 909 00:52:42,760 --> 00:53:11,000