1 00:00:00,000 --> 00:00:03,952 [MUSIC PLAYING] 2 00:00:03,952 --> 00:00:38,090 3 00:00:38,090 --> 00:00:40,315 Hello, my name is Craig Upson from Silicon Graphics, 4 00:00:40,315 --> 00:00:42,815 and I'm here to talk today about visual programming and data 5 00:00:42,815 --> 00:00:45,350 flow environments. 6 00:00:45,350 --> 00:00:47,330 Visual programming is not a new technique-- 7 00:00:47,330 --> 00:00:49,400 it's been around for quite a while-- 8 00:00:49,400 --> 00:00:52,100 nor is data flow a new technique. 9 00:00:52,100 --> 00:00:54,223 But the application of the two of them, 10 00:00:54,223 --> 00:00:55,640 the combination of the two of them 11 00:00:55,640 --> 00:00:59,240 together is something that has, in the last four or five years, 12 00:00:59,240 --> 00:01:02,180 become quite prevalent in computer usage, 13 00:01:02,180 --> 00:01:05,239 defining new paradigms and in computer programming 14 00:01:05,239 --> 00:01:07,880 environments, and in addition, new paradigms 15 00:01:07,880 --> 00:01:11,780 in computer usage environments. 16 00:01:11,780 --> 00:01:14,630 So the contents of this talk are a brief background 17 00:01:14,630 --> 00:01:17,720 to visual programming, a background 18 00:01:17,720 --> 00:01:21,590 on data flow with more specific details 19 00:01:21,590 --> 00:01:24,320 about certain implementations, and examples 20 00:01:24,320 --> 00:01:28,020 that illustrate the application of these techniques. 21 00:01:28,020 --> 00:01:30,800 So this is not an exhaustive survey of the field, 22 00:01:30,800 --> 00:01:32,430 but more of an introduction. 23 00:01:32,430 --> 00:01:37,640 A survey would take, of course, several lectures. 24 00:01:37,640 --> 00:01:40,130 So what is visual programming? 25 00:01:40,130 --> 00:01:45,670 Visual programming is really one of two major techniques. 26 00:01:45,670 --> 00:01:49,160 One is the application of visual languages to traditional 27 00:01:49,160 --> 00:01:53,180 computer programming, meaning that instead 28 00:01:53,180 --> 00:01:56,690 of, as an alternative to using a traditional computer language 29 00:01:56,690 --> 00:02:02,030 like C or C+ or Fortran, using a visual language 30 00:02:02,030 --> 00:02:03,500 in the programming step. 31 00:02:03,500 --> 00:02:06,260 The other application of visual programming 32 00:02:06,260 --> 00:02:08,930 is the application of visual languages 33 00:02:08,930 --> 00:02:11,990 to traditional computer usage. 34 00:02:11,990 --> 00:02:13,490 How do end users-- 35 00:02:13,490 --> 00:02:16,850 how do users use computers without programming? 36 00:02:16,850 --> 00:02:18,890 And that is through a new paradigm, 37 00:02:18,890 --> 00:02:22,370 through this different paradigm, which is visual programming. 38 00:02:22,370 --> 00:02:25,610 Well, why is visual programming of interest, 39 00:02:25,610 --> 00:02:29,930 or why is it becoming more popular as the years go by? 40 00:02:29,930 --> 00:02:34,220 The reasons primarily are that the computing environment 41 00:02:34,220 --> 00:02:37,310 in which we live is really becoming much more complicated. 42 00:02:37,310 --> 00:02:40,740 Computers are getting much more complicated. 43 00:02:40,740 --> 00:02:42,950 The environments in which we compute 44 00:02:42,950 --> 00:02:44,640 are much more complicated. 45 00:02:44,640 --> 00:02:49,850 We're now very much used to having several computers 46 00:02:49,850 --> 00:02:52,800 that we are all networked together. 47 00:02:52,800 --> 00:02:57,500 And so running processes on a variety of different machines 48 00:02:57,500 --> 00:03:00,170 is very commonplace. 49 00:03:00,170 --> 00:03:03,430 This makes the environment much more complicated, 50 00:03:03,430 --> 00:03:05,180 especially in a heterogeneous environment, 51 00:03:05,180 --> 00:03:09,410 where one machine is substantially 52 00:03:09,410 --> 00:03:11,820 different than another. 53 00:03:11,820 --> 00:03:16,700 In addition to that, there is now a large application 54 00:03:16,700 --> 00:03:20,030 backlog, meaning that the number of computer users 55 00:03:20,030 --> 00:03:24,230 have greatly outstripped the ability of computer programmers 56 00:03:24,230 --> 00:03:26,700 and application developers to provide applications 57 00:03:26,700 --> 00:03:28,320 that meet their needs. 58 00:03:28,320 --> 00:03:32,540 So these three reasons really give rise 59 00:03:32,540 --> 00:03:38,390 to this last point, which is this need 60 00:03:38,390 --> 00:03:41,450 for reusable components of applications, 61 00:03:41,450 --> 00:03:45,080 the ability to rapidly build an application by plugging 62 00:03:45,080 --> 00:03:51,150 together little, smaller pieces, the final result of which 63 00:03:51,150 --> 00:03:52,800 is a usable application. 64 00:03:52,800 --> 00:03:57,560 This is more of a plug and play technique of combining 65 00:03:57,560 --> 00:04:02,825 some function which is a piece of a program, 66 00:04:02,825 --> 00:04:05,240 plugging it into another component which 67 00:04:05,240 --> 00:04:08,240 is another piece of a program, and together, this collection 68 00:04:08,240 --> 00:04:12,510 of pieces form an application. 69 00:04:12,510 --> 00:04:16,459 And by doing so, we essentially deal with the application 70 00:04:16,459 --> 00:04:20,420 backlog by empowering end users to build 71 00:04:20,420 --> 00:04:22,520 customized applications for their needs 72 00:04:22,520 --> 00:04:25,770 without having to program. 73 00:04:25,770 --> 00:04:30,920 So let's talk about the basic types of visual programming 74 00:04:30,920 --> 00:04:31,670 systems. 75 00:04:31,670 --> 00:04:35,060 They fall into, primarily, three categories. 76 00:04:35,060 --> 00:04:37,970 The first is diagrammatic systems. 77 00:04:37,970 --> 00:04:41,780 These sort of evolved from old flowcharts 78 00:04:41,780 --> 00:04:45,440 from the '50s and '60s and '70s even. 79 00:04:45,440 --> 00:04:49,625 And they represent systems that, or language that is, 80 00:04:49,625 --> 00:04:52,520 or an application that's built with this diagrammatic language 81 00:04:52,520 --> 00:04:57,560 where you might have blocks that you plug together 82 00:04:57,560 --> 00:05:00,260 that are if-then-else statements, et cetera. 83 00:05:00,260 --> 00:05:02,190 There are also iconic systems. 84 00:05:02,190 --> 00:05:03,620 The second type is iconic systems. 85 00:05:03,620 --> 00:05:09,710 And now, these aren't necessarily flowchart-based, 86 00:05:09,710 --> 00:05:12,200 but they're more building blocks that 87 00:05:12,200 --> 00:05:14,630 have iconic representations. 88 00:05:14,630 --> 00:05:18,560 And the third one is form-based systems. 89 00:05:18,560 --> 00:05:22,490 They are outcomes of the big emphasis 90 00:05:22,490 --> 00:05:25,400 toward form-based interfaces from the 1970s, 91 00:05:25,400 --> 00:05:28,070 and early '80s, for that matter. 92 00:05:28,070 --> 00:05:30,980 And the uses of each of these three systems 93 00:05:30,980 --> 00:05:34,230 are primarily in two different ways. 94 00:05:34,230 --> 00:05:37,170 One is as a substitute for traditional language. 95 00:05:37,170 --> 00:05:41,640 And as I said before, the second is to create new applications. 96 00:05:41,640 --> 00:05:46,100 So one is a lower level kind of language substitute, 97 00:05:46,100 --> 00:05:49,220 and the other one is much higher level application substitute. 98 00:05:49,220 --> 00:05:53,090 So let's look now at each of those three systems-- 99 00:05:53,090 --> 00:05:55,340 diagrammatic systems, iconic systems, 100 00:05:55,340 --> 00:05:58,250 and form-based systems. 101 00:05:58,250 --> 00:06:03,770 In this example, we see a diagrammatic system, 102 00:06:03,770 --> 00:06:08,260 basically the Nassi-Schneiderman representation of-- 103 00:06:08,260 --> 00:06:11,420 in this case, it happens to be, how do you represent 104 00:06:11,420 --> 00:06:14,490 diagrammatically n factorial? 105 00:06:14,490 --> 00:06:17,030 So this is basically a computing strategy 106 00:06:17,030 --> 00:06:19,520 for how to represent n factorial. 107 00:06:19,520 --> 00:06:21,920 You see on the top of the diagram, 108 00:06:21,920 --> 00:06:25,340 this top wedge is basically an if statement. 109 00:06:25,340 --> 00:06:27,680 Is the number n-- where n is replaced 110 00:06:27,680 --> 00:06:30,620 with a number, obviously-- greater than 1? 111 00:06:30,620 --> 00:06:32,640 If not, you take the left branch, 112 00:06:32,640 --> 00:06:36,260 which is the false branch, and the factorial is equal to 1. 113 00:06:36,260 --> 00:06:42,740 And you return that value for the factorial calculation. 114 00:06:42,740 --> 00:06:44,600 If, indeed, it is greater than 1, 115 00:06:44,600 --> 00:06:47,870 then you take the right branch of this diagram. 116 00:06:47,870 --> 00:06:50,830 And you start off, and you initialize 117 00:06:50,830 --> 00:06:54,500 the factorial variable, n fact equal to 2, 118 00:06:54,500 --> 00:07:02,840 and then do a loop from 3 to n, accumulating n as you go, 119 00:07:02,840 --> 00:07:06,440 accumulating i as you go, and then you return n factorial. 120 00:07:06,440 --> 00:07:08,732 So it is, instead of a text-based language, 121 00:07:08,732 --> 00:07:10,190 I mean, it's actually a combination 122 00:07:10,190 --> 00:07:13,550 of a text-based and diagram-based language. 123 00:07:13,550 --> 00:07:20,510 This is just one glyph of this language 124 00:07:20,510 --> 00:07:23,460 in a diagrammatic system. 125 00:07:23,460 --> 00:07:26,630 If we look at the next base, the next type of system, 126 00:07:26,630 --> 00:07:28,290 these are iconic systems. 127 00:07:28,290 --> 00:07:31,280 This is an example from IRIS Explorer, which 128 00:07:31,280 --> 00:07:34,130 is the demonstration application that we're going to be 129 00:07:34,130 --> 00:07:36,200 using throughout the talk. 130 00:07:36,200 --> 00:07:39,890 In this diagram, you can see a series 131 00:07:39,890 --> 00:07:42,920 of boxes that are connected together with wires. 132 00:07:42,920 --> 00:07:47,225 Each of these boxes represents some computational function. 133 00:07:47,225 --> 00:07:50,270 134 00:07:50,270 --> 00:07:55,610 A box or module executes and sends its data downstream 135 00:07:55,610 --> 00:07:58,670 to other modules that are connected 136 00:07:58,670 --> 00:08:01,830 from a left-to-right representation. 137 00:08:01,830 --> 00:08:04,430 So if we look at this image, you'll 138 00:08:04,430 --> 00:08:09,110 see on the far right a box that has-- 139 00:08:09,110 --> 00:08:11,210 which you probably can't read on your screen-- 140 00:08:11,210 --> 00:08:13,860 the word render, which is the renderer in this system. 141 00:08:13,860 --> 00:08:17,060 It contains an image, which is the product of this data flow 142 00:08:17,060 --> 00:08:18,210 environment. 143 00:08:18,210 --> 00:08:22,500 So here the level of the language is much higher. 144 00:08:22,500 --> 00:08:26,450 It's now not individual assignment statements 145 00:08:26,450 --> 00:08:31,370 or operators, but more functions or perhaps even applications 146 00:08:31,370 --> 00:08:34,909 that are embedded into each one of these boxes. 147 00:08:34,909 --> 00:08:39,590 So each box now has ports-- 148 00:08:39,590 --> 00:08:43,429 that means places to connect data into and connect data out 149 00:08:43,429 --> 00:08:43,940 of-- 150 00:08:43,940 --> 00:08:49,440 in addition to controls, and we'll talk about that later. 151 00:08:49,440 --> 00:08:52,730 The last, the third system, is a form-based system. 152 00:08:52,730 --> 00:08:57,590 And probably the best example of that 153 00:08:57,590 --> 00:08:59,690 are spreadsheets, which are probably 154 00:08:59,690 --> 00:09:02,300 the widest used application on all computers. 155 00:09:02,300 --> 00:09:05,180 Again, it's a two-dimensional language. 156 00:09:05,180 --> 00:09:08,600 In this case, it's a 2D or two-dimensional array of cells, 157 00:09:08,600 --> 00:09:11,480 and one performs arithmetic on cells, 158 00:09:11,480 --> 00:09:13,610 performs calculations on cells. 159 00:09:13,610 --> 00:09:18,170 And this is a simple example of a spreadsheet. 160 00:09:18,170 --> 00:09:20,190 As I said before, there are two basic uses 161 00:09:20,190 --> 00:09:21,950 of visual programming. 162 00:09:21,950 --> 00:09:25,140 One is as a programming language itself, 163 00:09:25,140 --> 00:09:27,590 and the other one is as a vehicle 164 00:09:27,590 --> 00:09:29,168 for building applications. 165 00:09:29,168 --> 00:09:30,710 So let's look at the first one, which 166 00:09:30,710 --> 00:09:33,050 is as a programming language. 167 00:09:33,050 --> 00:09:35,630 And here the goal is really to simplify 168 00:09:35,630 --> 00:09:38,900 the act of programming. 169 00:09:38,900 --> 00:09:40,850 Again, as computers get more and more 170 00:09:40,850 --> 00:09:46,640 complex as large multiprocessors become more and more prevalent, 171 00:09:46,640 --> 00:09:50,820 the task of programming becomes more and more difficult 172 00:09:50,820 --> 00:09:54,450 and, in fact, excludes more and more people from programming. 173 00:09:54,450 --> 00:09:57,440 So the goal here is to simplify the whole process 174 00:09:57,440 --> 00:09:58,340 of programming. 175 00:09:58,340 --> 00:10:01,400 That means, typically, these are used 176 00:10:01,400 --> 00:10:04,460 for small program development. 177 00:10:04,460 --> 00:10:09,120 And I think you'll see why that is as we go to the next slide. 178 00:10:09,120 --> 00:10:12,680 But they're also used as a programming teaching tool 179 00:10:12,680 --> 00:10:15,980 for teaching people that have little or no programming 180 00:10:15,980 --> 00:10:21,030 background, and that was indeed the first usage of them. 181 00:10:21,030 --> 00:10:26,180 So these systems have been around for probably 182 00:10:26,180 --> 00:10:30,080 the last 15 years or so, and they 183 00:10:30,080 --> 00:10:34,385 represent the first application of visual programming usage. 184 00:10:34,385 --> 00:10:36,260 And as I said, they're basically replacements 185 00:10:36,260 --> 00:10:37,700 for a text-based language. 186 00:10:37,700 --> 00:10:39,470 The grain size is something that we're 187 00:10:39,470 --> 00:10:43,340 going to talk about quite a bit in the next few slides. 188 00:10:43,340 --> 00:10:50,683 Grain size is really, what is the atom of a visual language? 189 00:10:50,683 --> 00:10:52,100 What's the smallest thing that can 190 00:10:52,100 --> 00:10:57,320 be connected to something else to form some syntactically 191 00:10:57,320 --> 00:10:58,520 correct statement? 192 00:10:58,520 --> 00:11:03,260 And the grain size for a programming language 193 00:11:03,260 --> 00:11:07,970 is really the assignment statement or an operator. 194 00:11:07,970 --> 00:11:10,310 And that's exactly what it is for visual programming 195 00:11:10,310 --> 00:11:12,140 and programming language also. 196 00:11:12,140 --> 00:11:14,780 Frequently, these systems are language-centered, 197 00:11:14,780 --> 00:11:19,520 meaning there's been a fair amount of effort to build 198 00:11:19,520 --> 00:11:22,040 visual languages around specific computer 199 00:11:22,040 --> 00:11:23,900 languages such as Lisp. 200 00:11:23,900 --> 00:11:28,970 And that's where a lot of the work has gone on in the past. 201 00:11:28,970 --> 00:11:34,220 So some relevant examples here are Prograph, Tinkertoy, 202 00:11:34,220 --> 00:11:38,420 VennLISP, which is basically a list Lisp-centered language, 203 00:11:38,420 --> 00:11:41,570 and Hi-VISUAL. 204 00:11:41,570 --> 00:11:43,400 This is, by all means, a short list, 205 00:11:43,400 --> 00:11:49,790 because there's a lot of examples of visual programming 206 00:11:49,790 --> 00:11:50,840 languages of this type. 207 00:11:50,840 --> 00:11:53,990 And there are some references included with this tape, 208 00:11:53,990 --> 00:11:57,920 so you can check that out too. 209 00:11:57,920 --> 00:12:02,090 So let's look at one example from a system called Tinkertoy. 210 00:12:02,090 --> 00:12:04,070 If we want to do the following operation-- 211 00:12:04,070 --> 00:12:07,400 which is add two numbers, A and B, 212 00:12:07,400 --> 00:12:09,950 multiply that times a third number, C, 213 00:12:09,950 --> 00:12:13,130 and then divide the whole thing by D-- 214 00:12:13,130 --> 00:12:16,430 in this particular language, in Tinkertoy, 215 00:12:16,430 --> 00:12:19,130 you would build the structure as following. 216 00:12:19,130 --> 00:12:21,530 A and B are integers that are both plugged 217 00:12:21,530 --> 00:12:24,770 into this plus operator. 218 00:12:24,770 --> 00:12:27,260 The result of that comes out the right-hand side 219 00:12:27,260 --> 00:12:28,670 of that plus operator. 220 00:12:28,670 --> 00:12:31,910 It is then multiplied by C, which is another integer. 221 00:12:31,910 --> 00:12:35,270 The result of that is then sent to the divide operator 222 00:12:35,270 --> 00:12:44,240 and divided by D. So as you can see in the slide, 223 00:12:44,240 --> 00:12:48,020 on the top of the slide you see the traditional computer 224 00:12:48,020 --> 00:12:52,010 language representation of the statement, A plus B 225 00:12:52,010 --> 00:12:53,450 times C divided by D. 226 00:12:53,450 --> 00:12:56,960 On the bottom, you see this visual representation. 227 00:12:56,960 --> 00:13:00,500 And clearly, compactness is not one 228 00:13:00,500 --> 00:13:01,800 of the goals of this system. 229 00:13:01,800 --> 00:13:05,990 It's a very-- visual languages in general 230 00:13:05,990 --> 00:13:08,090 consume an incredible amount of screen space. 231 00:13:08,090 --> 00:13:10,760 And in this simple example, you can 232 00:13:10,760 --> 00:13:14,100 see the ramifications of that. 233 00:13:14,100 --> 00:13:17,330 So primarily, again, this is used for teaching tools, not 234 00:13:17,330 --> 00:13:21,320 necessarily as a full-blown programming language. 235 00:13:21,320 --> 00:13:24,080 Some of the problems that occur in systems such as this 236 00:13:24,080 --> 00:13:29,870 are that it's very difficult to devise a visual representation 237 00:13:29,870 --> 00:13:33,830 for complex operations. 238 00:13:33,830 --> 00:13:37,985 How do you really represent a matrix multiply 239 00:13:37,985 --> 00:13:41,090 or a matrix inversion? 240 00:13:41,090 --> 00:13:41,970 With an icon. 241 00:13:41,970 --> 00:13:45,950 In this case, the plus star operator and divide operator 242 00:13:45,950 --> 00:13:48,350 are very well-known by everyone. 243 00:13:48,350 --> 00:13:56,083 But it gets much harder as you get more complicated operators. 244 00:13:56,083 --> 00:13:57,750 So let's look at the other side of this, 245 00:13:57,750 --> 00:14:00,200 which is the application of visual programming 246 00:14:00,200 --> 00:14:03,540 for application creation systems. 247 00:14:03,540 --> 00:14:05,690 And here, the goals are much different. 248 00:14:05,690 --> 00:14:08,540 The goals are, how do you simplify the application 249 00:14:08,540 --> 00:14:11,360 building environment, how to build applications 250 00:14:11,360 --> 00:14:14,130 easy, or at least easier. 251 00:14:14,130 --> 00:14:19,100 And again, the reasons why these tools are more and more in use 252 00:14:19,100 --> 00:14:22,910 are because the environments in which we compete 253 00:14:22,910 --> 00:14:24,360 are much more complicated. 254 00:14:24,360 --> 00:14:27,390 Multiprocessors-- massively parallel 255 00:14:27,390 --> 00:14:30,720 processors add tremendous complications 256 00:14:30,720 --> 00:14:33,450 to the programming paradigm. 257 00:14:33,450 --> 00:14:36,210 Graphical workstations can add complications 258 00:14:36,210 --> 00:14:38,070 to the programming paradigm if you 259 00:14:38,070 --> 00:14:40,380 wish to adequately use them. 260 00:14:40,380 --> 00:14:42,450 And then finally, how do you build 261 00:14:42,450 --> 00:14:45,270 an application that will run in a distributed 262 00:14:45,270 --> 00:14:48,370 environment on several machines, several potentially 263 00:14:48,370 --> 00:14:50,610 dissimilar machines, machines that 264 00:14:50,610 --> 00:14:54,250 have different floating point representations, et cetera? 265 00:14:54,250 --> 00:14:57,630 So again, the goal here is to empower non-programmers 266 00:14:57,630 --> 00:15:01,350 to create applications for themselves. 267 00:15:01,350 --> 00:15:04,710 The distinction between this application 268 00:15:04,710 --> 00:15:07,830 of visual programming and the previous application 269 00:15:07,830 --> 00:15:09,420 is grain size. 270 00:15:09,420 --> 00:15:15,990 Here the atom of composition-- when you build an application 271 00:15:15,990 --> 00:15:18,690 system such as the [INAUDIBLE] [? Atom ?] of composition-- 272 00:15:18,690 --> 00:15:22,780 is using the function or subroutine level. 273 00:15:22,780 --> 00:15:27,310 Now, each function or subroutine then 274 00:15:27,310 --> 00:15:31,480 is connected together with some graph editor 275 00:15:31,480 --> 00:15:37,040 to build a directed graph, which then is your application. 276 00:15:37,040 --> 00:15:42,160 So some good examples of this are primarily from the Unix 277 00:15:42,160 --> 00:15:47,590 domain; aPE, which came out of Ohio State University 278 00:15:47,590 --> 00:15:51,560 and now is marketed by a company called 279 00:15:51,560 --> 00:15:55,420 T-ara Visual; AVS, or the Application Visualization 280 00:15:55,420 --> 00:15:59,893 System, marketed from ABS, Inc.; IRIS Explorer, 281 00:15:59,893 --> 00:16:01,810 which is a system that we're going to use here 282 00:16:01,810 --> 00:16:04,420 for demonstration; and Khoros from the University 283 00:16:04,420 --> 00:16:07,300 of New Mexico. 284 00:16:07,300 --> 00:16:10,090 There has been an awful lot of work lately, at least of late, 285 00:16:10,090 --> 00:16:14,740 in the Unix world, although there are some notable examples 286 00:16:14,740 --> 00:16:18,100 in the PC and Macintosh world, primarily LabVIEW 287 00:16:18,100 --> 00:16:20,830 from National Instruments. 288 00:16:20,830 --> 00:16:25,180 So let's look at an example from Khoros 289 00:16:25,180 --> 00:16:27,160 again, from the University of New Mexico. 290 00:16:27,160 --> 00:16:31,300 And in this diagram, we see a closeup 291 00:16:31,300 --> 00:16:37,030 of the Khoros graph editor, which is called Cantata. 292 00:16:37,030 --> 00:16:40,510 So as you'll notice, there are a number 293 00:16:40,510 --> 00:16:42,430 of boxes that are connected together. 294 00:16:42,430 --> 00:16:44,890 Boxes have glyphs on them. 295 00:16:44,890 --> 00:16:47,650 They have ports, which are probably 296 00:16:47,650 --> 00:16:51,392 impossible to see in the view that you see on your screen. 297 00:16:51,392 --> 00:16:53,350 And these are all connected together with wires 298 00:16:53,350 --> 00:16:56,150 to form a data flow diagram. 299 00:16:56,150 --> 00:16:58,480 And in this example, we're looking 300 00:16:58,480 --> 00:17:01,870 at acquiring an in utero sonogram 301 00:17:01,870 --> 00:17:03,610 and looking at the processing of that 302 00:17:03,610 --> 00:17:08,710 to isolate the left and right ventricle of a premature baby, 303 00:17:08,710 --> 00:17:14,560 which are the two green items that you see 304 00:17:14,560 --> 00:17:18,530 in the center of your screen, or the two red, small items 305 00:17:18,530 --> 00:17:21,290 that you see in the bottom right. 306 00:17:21,290 --> 00:17:25,390 So in other words, this is a graph-based system. 307 00:17:25,390 --> 00:17:28,480 There is a graph editor, a graph creator, 308 00:17:28,480 --> 00:17:30,400 in this case called Cantata, that 309 00:17:30,400 --> 00:17:32,170 builds this directed graph. 310 00:17:32,170 --> 00:17:37,510 You then instruct the system to fire, and in this case, 311 00:17:37,510 --> 00:17:43,210 modules execute, deriving data in each of the processes 312 00:17:43,210 --> 00:17:47,980 and sending data to the bottom, to the last glyph in the chain 313 00:17:47,980 --> 00:17:52,480 to produce the data that you've requested. 314 00:17:52,480 --> 00:17:56,890 So there is another term, which is different, 315 00:17:56,890 --> 00:18:01,090 although quite close and conceptually quite related 316 00:18:01,090 --> 00:18:05,560 to visual programming, and that's program visualization. 317 00:18:05,560 --> 00:18:08,470 In this case, the goal of program visualization 318 00:18:08,470 --> 00:18:11,410 is to really understand the execution of an existing 319 00:18:11,410 --> 00:18:12,280 program. 320 00:18:12,280 --> 00:18:15,700 So a program that's written in a traditional language, 321 00:18:15,700 --> 00:18:20,890 such as C or C++ or Fortran, is then instrumented. 322 00:18:20,890 --> 00:18:23,350 And the structure of this program 323 00:18:23,350 --> 00:18:26,860 is represented visually to aid in the comprehension 324 00:18:26,860 --> 00:18:33,400 of the calling order, et cetera, or where a program has 325 00:18:33,400 --> 00:18:37,060 been an awful lot of its time. 326 00:18:37,060 --> 00:18:42,160 So it's really meant to take existing programs 327 00:18:42,160 --> 00:18:48,520 and to use program visualization to understand the execution 328 00:18:48,520 --> 00:18:50,450 of the program itself. 329 00:18:50,450 --> 00:18:52,570 Now in addition, the reason it's quite related 330 00:18:52,570 --> 00:18:57,670 is that in addition to this, most visual programming systems 331 00:18:57,670 --> 00:18:59,470 do incorporate program visualization. 332 00:18:59,470 --> 00:19:00,928 And you'll see that in the examples 333 00:19:00,928 --> 00:19:03,760 that we have in just a little bit. 334 00:19:03,760 --> 00:19:08,830 So that's the background on visual programming. 335 00:19:08,830 --> 00:19:11,080 Now I want to talk a little bit more about data flow 336 00:19:11,080 --> 00:19:13,610 and how it relates to visual programming. 337 00:19:13,610 --> 00:19:17,380 So let's get a little bit of background on data flow. 338 00:19:17,380 --> 00:19:19,780 Data flow is, again, something that's 339 00:19:19,780 --> 00:19:22,730 been around for quite a while. 340 00:19:22,730 --> 00:19:24,783 It's nothing new. 341 00:19:24,783 --> 00:19:26,200 In fact, there have been computers 342 00:19:26,200 --> 00:19:29,020 that have been built around data flow architectures. 343 00:19:29,020 --> 00:19:33,680 But in the software systems, essentially, data flow 344 00:19:33,680 --> 00:19:36,850 is a graph that consists of modules. 345 00:19:36,850 --> 00:19:38,890 Modules are, in general, functions 346 00:19:38,890 --> 00:19:40,990 with inputs and outputs. 347 00:19:40,990 --> 00:19:44,530 The inputs are gathered and are processed by the function 348 00:19:44,530 --> 00:19:46,150 to produce outputs. 349 00:19:46,150 --> 00:19:52,240 All these modules or functions are connected together 350 00:19:52,240 --> 00:19:56,710 into a directed graph, typically an acyclic graph, 351 00:19:56,710 --> 00:20:00,850 and build an application. 352 00:20:00,850 --> 00:20:03,150 So modules typically, as I said before, 353 00:20:03,150 --> 00:20:04,150 have inputs and outputs. 354 00:20:04,150 --> 00:20:05,830 They also have interactive controls, 355 00:20:05,830 --> 00:20:09,890 at least in the systems that we'll talk about today. 356 00:20:09,890 --> 00:20:13,930 These controls modify the operation of the algorithm 357 00:20:13,930 --> 00:20:16,960 that the module has on its inputs to produce its outputs. 358 00:20:16,960 --> 00:20:19,780 359 00:20:19,780 --> 00:20:23,530 Data is transported between modules via connections that 360 00:20:23,530 --> 00:20:25,900 are built in a graph editor. 361 00:20:25,900 --> 00:20:27,430 In the previous example from Khoros, 362 00:20:27,430 --> 00:20:30,250 you saw lines connecting modules. 363 00:20:30,250 --> 00:20:35,450 Those were created-- basically, that is the data flow itself. 364 00:20:35,450 --> 00:20:39,280 So it is essentially, in general, a monopartite graph 365 00:20:39,280 --> 00:20:44,210 where you have modules which are represented as icons, wires, 366 00:20:44,210 --> 00:20:47,210 which are the data transport mechanisms. 367 00:20:47,210 --> 00:20:51,460 Now, there's a myriad of complications 368 00:20:51,460 --> 00:20:54,380 and different ways of doing this. 369 00:20:54,380 --> 00:20:58,210 But it sort of all boils down to, when do modules execute, 370 00:20:58,210 --> 00:21:01,370 and how do they transport, and how is data transported? 371 00:21:01,370 --> 00:21:04,090 So typically, modules execute when their inputs 372 00:21:04,090 --> 00:21:05,140 are fully satisfied. 373 00:21:05,140 --> 00:21:06,880 The question is, who determines when 374 00:21:06,880 --> 00:21:08,770 inputs are fully satisfied? 375 00:21:08,770 --> 00:21:12,980 And we'll talk quite a bit about that to come. 376 00:21:12,980 --> 00:21:15,280 So from this point on in the talk, 377 00:21:15,280 --> 00:21:18,340 I'd like to kind of narrow down on a specific class 378 00:21:18,340 --> 00:21:20,540 of application creation systems. 379 00:21:20,540 --> 00:21:23,050 And it's not really a general presentation 380 00:21:23,050 --> 00:21:25,270 of the whole topic, but more one focused 381 00:21:25,270 --> 00:21:29,080 on systems such as these. 382 00:21:29,080 --> 00:21:32,170 The major components in an application creation system-- 383 00:21:32,170 --> 00:21:34,390 it's data flow based-- 384 00:21:34,390 --> 00:21:35,590 are the following. 385 00:21:35,590 --> 00:21:38,950 There is a graph editor, as we've talked about before, 386 00:21:38,950 --> 00:21:41,890 and I'll show you how that works in a little bit. 387 00:21:41,890 --> 00:21:44,590 There is what I call the life support system or the execution 388 00:21:44,590 --> 00:21:47,770 environment of how modules execute. 389 00:21:47,770 --> 00:21:51,550 Typically, modules are not-- 390 00:21:51,550 --> 00:21:53,530 even if they are separate processes, 391 00:21:53,530 --> 00:21:56,980 they are not processes that are really self-sustaining, 392 00:21:56,980 --> 00:21:59,200 in that they have to be connected to someone else, 393 00:21:59,200 --> 00:22:03,100 or connections have to be made for them to really form 394 00:22:03,100 --> 00:22:04,510 a full-fledged application. 395 00:22:04,510 --> 00:22:06,160 So there is this kind of life support 396 00:22:06,160 --> 00:22:10,070 system that is used, which we call the execution environment. 397 00:22:10,070 --> 00:22:12,580 It also helps to determine when modules fire and when they 398 00:22:12,580 --> 00:22:15,180 don't. 399 00:22:15,180 --> 00:22:17,160 There's also a component library. 400 00:22:17,160 --> 00:22:22,510 In the visual language there are these atoms or modules. 401 00:22:22,510 --> 00:22:26,760 There may be 50 of them, there may be hundreds of them, 402 00:22:26,760 --> 00:22:28,560 there may be several thousands of them. 403 00:22:28,560 --> 00:22:31,200 And that is what is the component library 404 00:22:31,200 --> 00:22:36,540 is used to select modules from. 405 00:22:36,540 --> 00:22:39,540 So it's essentially a database where 406 00:22:39,540 --> 00:22:42,340 you might want to make queries on the component library 407 00:22:42,340 --> 00:22:45,960 to say, give me all the modules that accept this type of data 408 00:22:45,960 --> 00:22:48,690 on their inputs, or et cetera, or give me 409 00:22:48,690 --> 00:22:54,270 all the modules that were made by this institution, et cetera. 410 00:22:54,270 --> 00:22:56,880 The last major component is extension tools, 411 00:22:56,880 --> 00:22:58,710 that is, how are new modules built, 412 00:22:58,710 --> 00:23:00,130 how they add it to the system. 413 00:23:00,130 --> 00:23:03,210 Do you need to be the developer of the system, 414 00:23:03,210 --> 00:23:07,260 or can end users add new modules to the system? 415 00:23:07,260 --> 00:23:08,770 The issues that come into play here 416 00:23:08,770 --> 00:23:12,420 are, do you need to program in a traditional programming 417 00:23:12,420 --> 00:23:14,880 language to add new modules to the system? 418 00:23:14,880 --> 00:23:18,510 Or are you even prevented from using a traditional programming 419 00:23:18,510 --> 00:23:19,995 language? 420 00:23:19,995 --> 00:23:21,870 Can you get access to a programming language, 421 00:23:21,870 --> 00:23:25,590 or are you kept in the same visual programming paradigm, 422 00:23:25,590 --> 00:23:28,800 in which case, if you are using a traditional programming 423 00:23:28,800 --> 00:23:30,780 language, then there is, indeed, a paradigm 424 00:23:30,780 --> 00:23:33,870 shift going from the visual programming environment 425 00:23:33,870 --> 00:23:38,070 to this traditional programming language. 426 00:23:38,070 --> 00:23:40,690 And then the other aspect of extension tools are, 427 00:23:40,690 --> 00:23:44,010 how does one import and export data? 428 00:23:44,010 --> 00:23:47,190 These systems tend to be quite isolated, 429 00:23:47,190 --> 00:23:49,920 and getting data into the system or getting 430 00:23:49,920 --> 00:23:53,070 data out of the system can at times 431 00:23:53,070 --> 00:23:56,620 be difficult for non-programmers to do. 432 00:23:56,620 --> 00:23:59,010 So let's talk about the graph editor now. 433 00:23:59,010 --> 00:24:03,300 Again, this is what is used to create 434 00:24:03,300 --> 00:24:07,440 these data for the graphs. 435 00:24:07,440 --> 00:24:09,180 The goal of a graph editor is to be 436 00:24:09,180 --> 00:24:12,660 able to provide you with components from a component 437 00:24:12,660 --> 00:24:16,500 library that you can grab with a mouse, 438 00:24:16,500 --> 00:24:20,670 drop down on the graph editor pallet, 439 00:24:20,670 --> 00:24:24,010 connect modules up to form this graph. 440 00:24:24,010 --> 00:24:27,030 So typically, modules have a variety 441 00:24:27,030 --> 00:24:30,570 of different representations as you go from system to system. 442 00:24:30,570 --> 00:24:31,950 From its simplest representation, 443 00:24:31,950 --> 00:24:35,460 a module could be just a name, the name of the function 444 00:24:35,460 --> 00:24:39,540 that you want to use, with ports that allow you to connect it 445 00:24:39,540 --> 00:24:41,520 to other modules in the system. 446 00:24:41,520 --> 00:24:45,720 Or it can have an icon that represents 447 00:24:45,720 --> 00:24:48,240 more of a step in a visual language, 448 00:24:48,240 --> 00:24:50,820 some iconic representation of the operation 449 00:24:50,820 --> 00:24:52,470 that that module performs. 450 00:24:52,470 --> 00:24:56,460 Or it may even have controls on that icon 451 00:24:56,460 --> 00:25:00,240 itself, on that box itself, that allow you to manipulate 452 00:25:00,240 --> 00:25:03,990 the algorithm of the module. 453 00:25:03,990 --> 00:25:08,310 So modules are wired together to build these directed graphs. 454 00:25:08,310 --> 00:25:15,120 And in general, the graph editor is a very interactive support 455 00:25:15,120 --> 00:25:18,430 utility, in that you can modify the graph at any time. 456 00:25:18,430 --> 00:25:20,910 It's always live, it's always running. 457 00:25:20,910 --> 00:25:24,060 So you can add a new module to the system, take a module away, 458 00:25:24,060 --> 00:25:26,400 and the graph will automatically recompute, 459 00:25:26,400 --> 00:25:28,170 or the environment will automatically 460 00:25:28,170 --> 00:25:30,640 recompute the application. 461 00:25:30,640 --> 00:25:32,850 So let's go to an example here, which 462 00:25:32,850 --> 00:25:35,550 is from IRIS Explorer from Silicon Graphics. 463 00:25:35,550 --> 00:25:37,110 So what you see on the screen now, 464 00:25:37,110 --> 00:25:38,568 on the upper portion of the screen, 465 00:25:38,568 --> 00:25:43,350 is a graph editor on the right-hand side, 466 00:25:43,350 --> 00:25:45,150 and on the left-hand side, a component 467 00:25:45,150 --> 00:25:50,580 library where all the components that we wire together reside. 468 00:25:50,580 --> 00:25:55,320 So you grab a module from the component library, 469 00:25:55,320 --> 00:25:56,880 and you see this-- 470 00:25:56,880 --> 00:25:58,350 they just have names on them. 471 00:25:58,350 --> 00:26:03,720 Drop it down on the palette of the map editor. 472 00:26:03,720 --> 00:26:07,770 You can grab another one and drop it down also. 473 00:26:07,770 --> 00:26:09,930 In this system, modules-- 474 00:26:09,930 --> 00:26:13,650 and I'll just move it around here so we can zoom 475 00:26:13,650 --> 00:26:16,320 in on the upper right-hand quadrant-- 476 00:26:16,320 --> 00:26:21,040 you can see that a module has a name, which is, again, 477 00:26:21,040 --> 00:26:23,430 probably not readable here. 478 00:26:23,430 --> 00:26:25,290 But it also has widgets attached to it. 479 00:26:25,290 --> 00:26:27,090 In this case, there is a text type-in, 480 00:26:27,090 --> 00:26:35,460 and I'll type in the name for a file, 481 00:26:35,460 --> 00:26:37,490 assuming that I get the right file name. 482 00:26:37,490 --> 00:26:42,190 483 00:26:42,190 --> 00:26:44,410 And now, the way modules are connected 484 00:26:44,410 --> 00:26:48,160 to each other are by clicking on a port button 485 00:26:48,160 --> 00:26:51,160 to select the port that you want. 486 00:26:51,160 --> 00:26:54,820 This particular module reads an image from disk. 487 00:26:54,820 --> 00:27:00,130 We select that, come over here and select an input port 488 00:27:00,130 --> 00:27:03,250 from the module that's going to display 489 00:27:03,250 --> 00:27:05,300 that image to the screen. 490 00:27:05,300 --> 00:27:08,980 So here we have the image, and it 491 00:27:08,980 --> 00:27:14,920 happens to be this image of Africa shot from a satellite. 492 00:27:14,920 --> 00:27:18,400 Zoom in on it a bit, and you can see 493 00:27:18,400 --> 00:27:21,010 the image of what is the simplest data flow 494 00:27:21,010 --> 00:27:25,150 diagram possible, one module reading data from disk, 495 00:27:25,150 --> 00:27:29,350 executing, firing, sending that data to another module which 496 00:27:29,350 --> 00:27:31,280 displays it. 497 00:27:31,280 --> 00:27:35,200 So in this example-- 498 00:27:35,200 --> 00:27:39,940 let's go back in and re-execute the module 499 00:27:39,940 --> 00:27:45,290 that reads data from disk by selecting a Fire Now menu item. 500 00:27:45,290 --> 00:27:48,220 So you notice that the module lights up yellow when it fires. 501 00:27:48,220 --> 00:27:51,370 The next module downstream fires after the first module 502 00:27:51,370 --> 00:27:55,550 is completed, and the whole graph executes. 503 00:27:55,550 --> 00:28:03,310 So that's an easy example of, or a very trivial example 504 00:28:03,310 --> 00:28:05,320 of what we're talking about. 505 00:28:05,320 --> 00:28:11,440 Now let's add another collection of modules that have been 506 00:28:11,440 --> 00:28:17,800 preconfigured to perform some-- 507 00:28:17,800 --> 00:28:20,890 not image processing, but rendering capabilities, 508 00:28:20,890 --> 00:28:24,070 and so we can connect from that existing module that reads data 509 00:28:24,070 --> 00:28:28,720 off of disk into other modules. 510 00:28:28,720 --> 00:28:32,260 And we're going to make one more connection. 511 00:28:32,260 --> 00:28:35,260 And now that entire chain of modules 512 00:28:35,260 --> 00:28:38,765 executes, fires, and displays now 513 00:28:38,765 --> 00:28:40,390 what appears to be an image, but indeed 514 00:28:40,390 --> 00:28:43,840 is really a three-dimensional entity 515 00:28:43,840 --> 00:28:47,030 that we can rotate and zoom in on. 516 00:28:47,030 --> 00:28:49,960 So what we're seeing here is the ability 517 00:28:49,960 --> 00:28:56,390 of the system to allow you to modify the graph at runtime. 518 00:28:56,390 --> 00:28:58,300 And essentially, everything is live. 519 00:28:58,300 --> 00:29:02,420 In fact, it's difficult to make things not live. 520 00:29:02,420 --> 00:29:09,400 So in this example, we have the graph executing. 521 00:29:09,400 --> 00:29:12,730 One branch of the graph is just displaying an image on disk. 522 00:29:12,730 --> 00:29:17,200 The second branch of the graph is taking that image, 523 00:29:17,200 --> 00:29:20,320 displacing it in a normal direction based 524 00:29:20,320 --> 00:29:25,460 on the luminance of the image, and building this height field. 525 00:29:25,460 --> 00:29:28,790 So if we look edge-on on this image, 526 00:29:28,790 --> 00:29:31,000 you can see that there is actually displacement. 527 00:29:31,000 --> 00:29:34,540 And in fact, we can modify the amount of displacement 528 00:29:34,540 --> 00:29:37,750 by changing a dial on the graph. 529 00:29:37,750 --> 00:29:41,440 New models execute downstream from that dial change. 530 00:29:41,440 --> 00:29:44,860 And now you can see a little bit more vertical displacement 531 00:29:44,860 --> 00:29:45,610 of that map. 532 00:29:45,610 --> 00:29:48,870 533 00:29:48,870 --> 00:29:55,300 Let's talk a little bit about the implications of the graph 534 00:29:55,300 --> 00:29:56,340 editor itself. 535 00:29:56,340 --> 00:29:57,360 When does this work? 536 00:29:57,360 --> 00:29:59,130 When doesn't it work? 537 00:29:59,130 --> 00:30:05,710 What are the limitations of, in this case, visual programming? 538 00:30:05,710 --> 00:30:10,410 So the graph editor is a very useful entity 539 00:30:10,410 --> 00:30:13,350 when the number of items that you're going to connect to 540 00:30:13,350 --> 00:30:16,410 is relatively small, meaning that when 541 00:30:16,410 --> 00:30:18,840 you're talking about somewhere around 10 or 15 modules. 542 00:30:18,840 --> 00:30:22,830 That's probably the upper limit of really the visual complexity 543 00:30:22,830 --> 00:30:26,790 that one wants to really deal with in a visual programming 544 00:30:26,790 --> 00:30:27,510 language. 545 00:30:27,510 --> 00:30:29,460 From the earlier example that you 546 00:30:29,460 --> 00:30:33,780 saw of the language-based example from Tinkertoy, 547 00:30:33,780 --> 00:30:36,700 there were four atoms that were connected together. 548 00:30:36,700 --> 00:30:39,450 And you could imagine that if you had 50 of those, 549 00:30:39,450 --> 00:30:42,300 it would be very difficult to understand what's going on. 550 00:30:42,300 --> 00:30:44,460 And that is indeed the problem here also. 551 00:30:44,460 --> 00:30:48,090 If you have 50 modules connected together in a graph, 552 00:30:48,090 --> 00:30:53,400 it resembles spaghetti and is pretty much uninterpretable. 553 00:30:53,400 --> 00:30:55,860 So somewhere around 15-- 554 00:30:55,860 --> 00:30:59,340 10 to 20 modules is basically the upper limit 555 00:30:59,340 --> 00:31:00,810 of visual complexity. 556 00:31:00,810 --> 00:31:02,730 If you have atoms or modules that 557 00:31:02,730 --> 00:31:07,680 are too primitive, then building a usable graph 558 00:31:07,680 --> 00:31:09,600 requires too many of them. 559 00:31:09,600 --> 00:31:12,900 If you have atoms or modules that are too specialized, 560 00:31:12,900 --> 00:31:16,680 then it's very difficult to interconnect modules together. 561 00:31:16,680 --> 00:31:18,960 So this kind of seven plus or minus 562 00:31:18,960 --> 00:31:24,930 two rule really applies here, in that around seven to 10 modules 563 00:31:24,930 --> 00:31:28,500 is a very comfortable feel. 564 00:31:28,500 --> 00:31:31,050 So essentially, what you're trying to do 565 00:31:31,050 --> 00:31:33,480 is always balance this visual complexity 566 00:31:33,480 --> 00:31:37,410 with the functionality that you need for an application. 567 00:31:37,410 --> 00:31:42,180 And this leads the next point of, basically, hierarchy. 568 00:31:42,180 --> 00:31:44,290 Is the graph editor-- 569 00:31:44,290 --> 00:31:48,420 does it support the ability to take a collection of modules, 570 00:31:48,420 --> 00:31:50,730 abstract them, group them together, abstract them, 571 00:31:50,730 --> 00:31:54,430 and build a single visual representation? 572 00:31:54,430 --> 00:32:00,040 So let's go back to that example that we were using before. 573 00:32:00,040 --> 00:32:03,390 So read in the image that's going to be read from disk. 574 00:32:03,390 --> 00:32:09,350 575 00:32:09,350 --> 00:32:13,520 Now read in the other, the next collection of modules 576 00:32:13,520 --> 00:32:14,810 that we had previously saved. 577 00:32:14,810 --> 00:32:17,930 578 00:32:17,930 --> 00:32:19,150 Connect everything up again. 579 00:32:19,150 --> 00:32:26,890 580 00:32:26,890 --> 00:32:30,130 Again, when the module executes, it 581 00:32:30,130 --> 00:32:35,370 turns yellow, indicating execution. 582 00:32:35,370 --> 00:32:39,340 So here we have a data flow diagram 583 00:32:39,340 --> 00:32:41,180 that is relatively simple. 584 00:32:41,180 --> 00:32:42,820 There are only five modules in it. 585 00:32:42,820 --> 00:32:47,860 The question is now, can we take individual modules and group 586 00:32:47,860 --> 00:32:48,650 them together? 587 00:32:48,650 --> 00:32:52,320 So let's take all the modules except for the last renderer 588 00:32:52,320 --> 00:32:54,640 and group them together, building 589 00:32:54,640 --> 00:32:57,170 a much simpler diagram. 590 00:32:57,170 --> 00:33:03,280 And so now we still have the data flow paradigm. 591 00:33:03,280 --> 00:33:08,770 And this new group, which is right here, 592 00:33:08,770 --> 00:33:12,310 shows you that you can collapse all these things, all 593 00:33:12,310 --> 00:33:15,160 these modules, into any groups and build 594 00:33:15,160 --> 00:33:18,520 a hierarchical representation of your data for the diagram. 595 00:33:18,520 --> 00:33:21,430 This case is a rather trivial example, 596 00:33:21,430 --> 00:33:24,280 but we'll talk about some others later 597 00:33:24,280 --> 00:33:26,065 on that are much more complex. 598 00:33:26,065 --> 00:33:28,630 599 00:33:28,630 --> 00:33:31,900 So what happens when you connect two modules together? 600 00:33:31,900 --> 00:33:39,130 There are two different possibilities in graph editors. 601 00:33:39,130 --> 00:33:43,000 One is, does the graph editor help you 602 00:33:43,000 --> 00:33:45,000 in deciding what is a valid connection, 603 00:33:45,000 --> 00:33:49,130 or does it permit any connection whatsoever between two modules? 604 00:33:49,130 --> 00:33:52,180 So this is basically the strong typing issue. 605 00:33:52,180 --> 00:33:55,390 Some graph editors are strongly typed, 606 00:33:55,390 --> 00:33:59,170 mean that they will permit only legal connections between two 607 00:33:59,170 --> 00:34:00,770 modules. 608 00:34:00,770 --> 00:34:05,950 So not only does the graph editor 609 00:34:05,950 --> 00:34:08,929 prevent you from making illegal connections, but in most cases, 610 00:34:08,929 --> 00:34:12,040 it will direct you visually as to what connections 611 00:34:12,040 --> 00:34:15,580 are permissible in which case. 612 00:34:15,580 --> 00:34:17,739 This is all done at connect time. 613 00:34:17,739 --> 00:34:20,920 Most systems also have verification at runtime 614 00:34:20,920 --> 00:34:25,870 when the data flow environment is actually 615 00:34:25,870 --> 00:34:31,270 executing, in which case, there is runtime checking of data. 616 00:34:31,270 --> 00:34:33,730 And in this case, the data has to be self-describing. 617 00:34:33,730 --> 00:34:37,300 So when a module receives it, it knows that this 618 00:34:37,300 --> 00:34:40,330 is data of the wrong type. 619 00:34:40,330 --> 00:34:43,010 The other possibility is no type checking whatsoever. 620 00:34:43,010 --> 00:34:45,670 And this is, of course, much easier 621 00:34:45,670 --> 00:34:47,380 to implement a system of this type. 622 00:34:47,380 --> 00:34:50,710 It's certainly much more flexible 623 00:34:50,710 --> 00:34:56,230 for advanced users, in that they can do all kinds of things. 624 00:34:56,230 --> 00:34:59,870 You can always find a hack for what you're looking to do, 625 00:34:59,870 --> 00:35:01,180 but it is always-- 626 00:35:01,180 --> 00:35:02,720 it's error-prone for all users. 627 00:35:02,720 --> 00:35:05,270 628 00:35:05,270 --> 00:35:11,990 So let's talk about data types now for a moment. 629 00:35:11,990 --> 00:35:15,460 This determines really when connections are legal. 630 00:35:15,460 --> 00:35:18,550 In data flow environments or in visual programming-based data 631 00:35:18,550 --> 00:35:22,880 flow environments, there are a couple of different choices. 632 00:35:22,880 --> 00:35:25,480 One is you can have a restricted environment, which 633 00:35:25,480 --> 00:35:27,970 means you probably have a smaller number of data 634 00:35:27,970 --> 00:35:31,420 types, which the graph editor uses for type checking 635 00:35:31,420 --> 00:35:33,430 at connect time. 636 00:35:33,430 --> 00:35:36,250 In this case, you can shield end users 637 00:35:36,250 --> 00:35:40,780 from all the details of these, the data type itself. 638 00:35:40,780 --> 00:35:42,850 Or you can have a system which is completely open 639 00:35:42,850 --> 00:35:46,150 and has no data types whatsoever, which 640 00:35:46,150 --> 00:35:51,040 means that any two modules can be connected together, 641 00:35:51,040 --> 00:35:55,160 but they won't necessarily be able to share data. 642 00:35:55,160 --> 00:35:58,750 So most systems rely on restricted data types 643 00:35:58,750 --> 00:36:01,630 for ease of use issues. 644 00:36:01,630 --> 00:36:05,530 And again, just like the grain size of the modules, 645 00:36:05,530 --> 00:36:08,345 there's another issue about how many data types are there. 646 00:36:08,345 --> 00:36:10,720 If you have a system that has a very small number of data 647 00:36:10,720 --> 00:36:13,780 types, then you have maximum interconnectivity. 648 00:36:13,780 --> 00:36:16,990 If you can imagine a system that had only one data type, 649 00:36:16,990 --> 00:36:19,360 then every module would be able to talk to, 650 00:36:19,360 --> 00:36:21,682 communicate with every other module. 651 00:36:21,682 --> 00:36:23,890 If you have a system that has hundreds of data types, 652 00:36:23,890 --> 00:36:28,900 then the chance of being able to connect module A to module B 653 00:36:28,900 --> 00:36:32,320 is a very small chance, and so you lose 654 00:36:32,320 --> 00:36:34,900 this maximum interconnectivity. 655 00:36:34,900 --> 00:36:36,280 But it's a very rich system. 656 00:36:36,280 --> 00:36:39,790 So there's always this trade-off between a small number 657 00:36:39,790 --> 00:36:43,170 of data types and large number of data types. 658 00:36:43,170 --> 00:36:46,870 The more successful data flow application billing systems 659 00:36:46,870 --> 00:36:49,630 around typically have a small number of data types, 660 00:36:49,630 --> 00:36:52,570 perhaps, at most, 10. 661 00:36:52,570 --> 00:36:55,750 These types are typically very abstracted also. 662 00:36:55,750 --> 00:36:58,720 One common data type is an array, 663 00:36:58,720 --> 00:37:02,710 a multidimensional array, that can be 664 00:37:02,710 --> 00:37:04,540 used in its most abstract form. 665 00:37:04,540 --> 00:37:08,260 So a module might produce a floating point 666 00:37:08,260 --> 00:37:10,150 two-dimensional array on its output port, 667 00:37:10,150 --> 00:37:14,802 and the module downstream might accept any type of array, 668 00:37:14,802 --> 00:37:16,510 regardless of whether it's floating point 669 00:37:16,510 --> 00:37:19,420 or two-dimensional or three-dimensional. 670 00:37:19,420 --> 00:37:22,760 So these are more abstracted data types. 671 00:37:22,760 --> 00:37:28,950 So now that we've covered the data type issue, 672 00:37:28,950 --> 00:37:31,060 we need to talk about what really 673 00:37:31,060 --> 00:37:33,970 is the structure of a module. 674 00:37:33,970 --> 00:37:37,960 Is a module a separate process, separate Unix process? 675 00:37:37,960 --> 00:37:42,850 Or are all the modules bound into a single substrate, 676 00:37:42,850 --> 00:37:45,880 which then a controller determines 677 00:37:45,880 --> 00:37:47,950 which modules execute? 678 00:37:47,950 --> 00:37:49,665 So there's two issues here-- 679 00:37:49,665 --> 00:37:51,290 I mean, there's two possibilities here. 680 00:37:51,290 --> 00:37:54,190 One is a monolithic approach, which 681 00:37:54,190 --> 00:37:59,440 is all modules bound into the same process. 682 00:37:59,440 --> 00:38:02,470 And it has its advantages and disadvantages. 683 00:38:02,470 --> 00:38:05,120 In general, it can be much more efficient, 684 00:38:05,120 --> 00:38:08,140 especially on machines where processes are expensive 685 00:38:08,140 --> 00:38:11,680 or context which is expensive, in that there 686 00:38:11,680 --> 00:38:14,830 is no context, which is simply basically 687 00:38:14,830 --> 00:38:17,770 a function table of modules. 688 00:38:17,770 --> 00:38:20,613 Modules become a function table. 689 00:38:20,613 --> 00:38:22,030 In this case, it becomes much more 690 00:38:22,030 --> 00:38:24,670 difficult to add new modules to the system, in that you have 691 00:38:24,670 --> 00:38:29,080 to relink the entire system or rely on dynamic linking to add 692 00:38:29,080 --> 00:38:30,850 a new module to the system. 693 00:38:30,850 --> 00:38:35,650 And in general, parallelization is somewhat problematic also, 694 00:38:35,650 --> 00:38:39,880 in that it's difficult to decide when modules 695 00:38:39,880 --> 00:38:43,240 should be firing in parallel. 696 00:38:43,240 --> 00:38:46,850 The other possibility is to have a separate process per module. 697 00:38:46,850 --> 00:38:49,300 And in this case, the trade-offs are different. 698 00:38:49,300 --> 00:38:51,976 Now you have to deal with context switching, 699 00:38:51,976 --> 00:38:55,330 switching in one module over the next. 700 00:38:55,330 --> 00:39:01,540 And that can be a killer on some machines like PCs, which really 701 00:39:01,540 --> 00:39:05,470 don't support multi-processes. 702 00:39:05,470 --> 00:39:09,670 The advantages are that parallelism is much easier. 703 00:39:09,670 --> 00:39:13,690 In an operating system, such as Unix, 704 00:39:13,690 --> 00:39:17,350 Unix will basically schedule modules for you. 705 00:39:17,350 --> 00:39:19,900 And you no longer have the dynamic linking problem. 706 00:39:19,900 --> 00:39:23,860 That is, when you as a user write a new module 707 00:39:23,860 --> 00:39:27,490 and add it to the system, you simply 708 00:39:27,490 --> 00:39:30,160 compile and link that module while the rest of the system 709 00:39:30,160 --> 00:39:30,760 is in place. 710 00:39:30,760 --> 00:39:35,020 711 00:39:35,020 --> 00:39:41,500 So now, if we go down this path of picking one of these two 712 00:39:41,500 --> 00:39:45,970 alternatives of monolithic versus separate processes, 713 00:39:45,970 --> 00:39:47,260 we're going to go down-- 714 00:39:47,260 --> 00:39:50,140 the bulk of the rest of the talk is targeted 715 00:39:50,140 --> 00:39:52,700 toward separate processes. 716 00:39:52,700 --> 00:39:55,450 So then you need to talk about what is the data and control 717 00:39:55,450 --> 00:39:58,910 mechanism of sending information between modules. 718 00:39:58,910 --> 00:40:00,490 So there are two types of information 719 00:40:00,490 --> 00:40:02,230 that are sent between modules. 720 00:40:02,230 --> 00:40:04,540 One is control information, which says 721 00:40:04,540 --> 00:40:06,460 you should execute right now. 722 00:40:06,460 --> 00:40:09,850 You have your slot at the CPU, so now is the right time 723 00:40:09,850 --> 00:40:10,780 to execute. 724 00:40:10,780 --> 00:40:12,580 And the other one is, here's data 725 00:40:12,580 --> 00:40:15,170 that I want you to execute on. 726 00:40:15,170 --> 00:40:20,500 So in general, control is very low-volume data, 727 00:40:20,500 --> 00:40:21,760 low-volume information. 728 00:40:21,760 --> 00:40:25,420 It is simply, execute with some parameters, perhaps. 729 00:40:25,420 --> 00:40:28,220 Data potentially could be a very high volume. 730 00:40:28,220 --> 00:40:30,190 And the example that we saw before 731 00:40:30,190 --> 00:40:36,550 of the image of Africa, that's an image that is probably 732 00:40:36,550 --> 00:40:39,490 300 pixels squared. 733 00:40:39,490 --> 00:40:40,980 When we turn that into polygons, it 734 00:40:40,980 --> 00:40:45,610 ends up being, in that case, several hundred 735 00:40:45,610 --> 00:40:46,360 thousand polygons. 736 00:40:46,360 --> 00:40:49,780 737 00:40:49,780 --> 00:40:54,160 And now, the motion of that data becomes a very relevant issue 738 00:40:54,160 --> 00:40:55,480 in data flow in general. 739 00:40:55,480 --> 00:40:58,420 The goal is minimum data motion, meaning 740 00:40:58,420 --> 00:41:02,320 that you don't want to make extra copies of data 741 00:41:02,320 --> 00:41:03,730 if you can avoid it. 742 00:41:03,730 --> 00:41:05,980 And you don't want to transport data between processes 743 00:41:05,980 --> 00:41:08,930 unless you can avoid it. 744 00:41:08,930 --> 00:41:12,370 So the choices are now, do you piggyback the data 745 00:41:12,370 --> 00:41:15,520 on the control, or do you use separate paths 746 00:41:15,520 --> 00:41:16,960 for data and control messages? 747 00:41:16,960 --> 00:41:19,060 If you piggyback data on control, 748 00:41:19,060 --> 00:41:21,430 then the bottleneck becomes really 749 00:41:21,430 --> 00:41:24,730 this large data transmission, if there is a central controller. 750 00:41:24,730 --> 00:41:28,210 In that case, the data and the control 751 00:41:28,210 --> 00:41:30,880 would need to be transferred back to a central controller 752 00:41:30,880 --> 00:41:34,480 and then transferred back to the next process downstream. 753 00:41:34,480 --> 00:41:38,620 Or do you use separate paths for data and control information, 754 00:41:38,620 --> 00:41:42,280 in which case, you have more messaging, 755 00:41:42,280 --> 00:41:48,280 but you now can allow for direct module-to-module communication 756 00:41:48,280 --> 00:41:53,960 for data passing, and in fact, for control passing too. 757 00:41:53,960 --> 00:41:59,020 So if we look at the next slide, here is a diagram of one data 758 00:41:59,020 --> 00:42:02,530 producer, module A, in the center of the screen, 759 00:42:02,530 --> 00:42:07,030 sending data downstream to module B and C. Now, 760 00:42:07,030 --> 00:42:10,630 these may be separate ports on the output of module A, 761 00:42:10,630 --> 00:42:13,620 or they may indeed be the same port. 762 00:42:13,620 --> 00:42:18,590 So in essence, we see, in this case, 763 00:42:18,590 --> 00:42:21,410 separate independent paths for control, 764 00:42:21,410 --> 00:42:24,290 which is going between A and B and data, which 765 00:42:24,290 --> 00:42:27,380 is along the yellow lines. 766 00:42:27,380 --> 00:42:30,260 It could be that there is a central controller 767 00:42:30,260 --> 00:42:31,670 in a system such as this. 768 00:42:31,670 --> 00:42:34,850 And many data flow environments do have central controllers 769 00:42:34,850 --> 00:42:37,070 that the controller would go from module 770 00:42:37,070 --> 00:42:39,230 A-- module A would fire. 771 00:42:39,230 --> 00:42:41,270 It would send control information back 772 00:42:41,270 --> 00:42:42,545 to the controller. 773 00:42:42,545 --> 00:42:44,420 The controller would then send that back down 774 00:42:44,420 --> 00:42:46,378 to module B and say, OK, now it is time for you 775 00:42:46,378 --> 00:42:51,200 to execute and do the same thing with module C. 776 00:42:51,200 --> 00:42:55,370 So this brings up the next central question-- 777 00:42:55,370 --> 00:42:58,100 when do modules fire? 778 00:42:58,100 --> 00:42:59,750 There are two possibilities here. 779 00:42:59,750 --> 00:43:02,450 One is a central controller, and the other 780 00:43:02,450 --> 00:43:06,800 is a control-free or asynchronous model. 781 00:43:06,800 --> 00:43:09,320 In a central controller, the central controller 782 00:43:09,320 --> 00:43:11,690 then monitors all data transmissions and all control 783 00:43:11,690 --> 00:43:14,880 transmissions from modules. 784 00:43:14,880 --> 00:43:21,410 So it is a party to all communication. 785 00:43:21,410 --> 00:43:25,910 In this case, it's very easy to prevent extra firings 786 00:43:25,910 --> 00:43:27,350 from modules. 787 00:43:27,350 --> 00:43:31,420 But in general, it becomes the bottleneck in large graphs. 788 00:43:31,420 --> 00:43:34,700 So if you have a graph that consists of 10 modules, 789 00:43:34,700 --> 00:43:36,620 and you have one central controller-- 790 00:43:36,620 --> 00:43:38,780 if we're talking about separate processes now-- 791 00:43:38,780 --> 00:43:41,270 every time a module fires, it sends information back 792 00:43:41,270 --> 00:43:47,000 to that central controller and does context switching, 793 00:43:47,000 --> 00:43:52,130 maps out controller, and the controller sends a message now, 794 00:43:52,130 --> 00:43:54,050 broadcasts a message down to all the modules 795 00:43:54,050 --> 00:43:56,030 downstream from that first module, 796 00:43:56,030 --> 00:43:57,520 telling them that they can execute. 797 00:43:57,520 --> 00:44:01,280 So you can get an awful lot of communication. 798 00:44:01,280 --> 00:44:03,440 Let's talk a little bit about extra firings. 799 00:44:03,440 --> 00:44:06,200 In this very simple diagram of module A 800 00:44:06,200 --> 00:44:10,550 communicating its data to module B and C, which both then 801 00:44:10,550 --> 00:44:13,100 communicate their data to module D, 802 00:44:13,100 --> 00:44:15,380 let's assume that module B is a slow module, 803 00:44:15,380 --> 00:44:17,870 and module C is a fast module. 804 00:44:17,870 --> 00:44:21,740 When A fires, B and C both get information about firing. 805 00:44:21,740 --> 00:44:25,100 B is slow, and so it cranks and cranks and cranks. 806 00:44:25,100 --> 00:44:28,590 C, if it's computing in parallel, is fast. 807 00:44:28,590 --> 00:44:29,760 It finishes its data. 808 00:44:29,760 --> 00:44:32,240 It sends it to D. D potentially fires, 809 00:44:32,240 --> 00:44:34,760 because it has received new data, 810 00:44:34,760 --> 00:44:38,060 and its inputs may be satisfied enough to fire. 811 00:44:38,060 --> 00:44:41,870 When B finally finishes firing, it sends its information down 812 00:44:41,870 --> 00:44:45,230 to D, and then D executes again. 813 00:44:45,230 --> 00:44:47,420 But clearly, in this example, D should only 814 00:44:47,420 --> 00:44:49,610 execute once whenever A fires. 815 00:44:49,610 --> 00:44:54,200 So in a multi-process application, 816 00:44:54,200 --> 00:44:55,910 it can be very difficult to determine, 817 00:44:55,910 --> 00:44:58,130 and especially in a control-free application 818 00:44:58,130 --> 00:45:00,680 environment can be very difficult to eliminate 819 00:45:00,680 --> 00:45:03,230 these extra firings. 820 00:45:03,230 --> 00:45:07,790 So let's look at what a central controller does. 821 00:45:07,790 --> 00:45:09,890 When A executes, it sends its information, 822 00:45:09,890 --> 00:45:12,680 its control information, down to a central controller. 823 00:45:12,680 --> 00:45:14,210 That controller sends back-- 824 00:45:14,210 --> 00:45:16,460 sends a message now to B and says, 825 00:45:16,460 --> 00:45:19,100 OK, now it's your slot at the CPU. 826 00:45:19,100 --> 00:45:20,090 You can execute. 827 00:45:20,090 --> 00:45:23,908 It sends a message to C. When B and C finish firing, 828 00:45:23,908 --> 00:45:25,700 they send a message back to the controller, 829 00:45:25,700 --> 00:45:27,410 and then the controller sends a message 830 00:45:27,410 --> 00:45:29,450 to D saying that you can fire. 831 00:45:29,450 --> 00:45:32,700 So you can see that, even in the simple diagram, 832 00:45:32,700 --> 00:45:37,350 the central controller could easily be the bottleneck. 833 00:45:37,350 --> 00:45:39,080 So what are the alternatives here? 834 00:45:39,080 --> 00:45:41,340 The alternative is distributed control. 835 00:45:41,340 --> 00:45:44,750 That means giving each module enough information 836 00:45:44,750 --> 00:45:47,435 to decide when it should indeed fire itself. 837 00:45:47,435 --> 00:45:50,070 838 00:45:50,070 --> 00:45:53,420 This means that the execution algorithm or the firing 839 00:45:53,420 --> 00:45:55,940 algorithm is much more complex. 840 00:45:55,940 --> 00:45:59,930 Each module needs to know when its inputs are fully satisfied, 841 00:45:59,930 --> 00:46:04,310 when it can expect new data on other ports that 842 00:46:04,310 --> 00:46:08,680 may be related to ports that have already executed 843 00:46:08,680 --> 00:46:12,560 and then fire, execute itself. 844 00:46:12,560 --> 00:46:16,710 This minimizes the bottleneck of the center controller problem. 845 00:46:16,710 --> 00:46:20,660 So if we look at this simple example, again, of A producing 846 00:46:20,660 --> 00:46:23,030 data and sending it to B and C, and both 847 00:46:23,030 --> 00:46:26,900 of them producing data and sending to D, when A fires, 848 00:46:26,900 --> 00:46:31,070 it sends data and control to both B and C. 849 00:46:31,070 --> 00:46:34,220 Both B and C then need to know-- 850 00:46:34,220 --> 00:46:37,010 need to be smart enough to know when they have enough data 851 00:46:37,010 --> 00:46:41,210 and that no other data would be coming in on any other ports. 852 00:46:41,210 --> 00:46:43,550 They then fire. 853 00:46:43,550 --> 00:46:45,080 They send data down to D. 854 00:46:45,080 --> 00:46:49,070 Let's again assume that B is a slow one, and the data from C 855 00:46:49,070 --> 00:46:51,330 arrives to D first. 856 00:46:51,330 --> 00:46:55,490 This means that now D has two ports, has two input ports, 857 00:46:55,490 --> 00:46:58,380 one from C and one from B. It needs to decide, 858 00:46:58,380 --> 00:47:01,220 am I ready to execute now or not? 859 00:47:01,220 --> 00:47:06,470 So in some systems, data is tagged along with, 860 00:47:06,470 --> 00:47:10,610 not only control information, but heritage information. 861 00:47:10,610 --> 00:47:14,010 Now the data coming to D on the C port 862 00:47:14,010 --> 00:47:17,550 would have tags saying that this data was produced by firings 863 00:47:17,550 --> 00:47:21,990 of A and C. D would then look on its other ports to see, 864 00:47:21,990 --> 00:47:24,720 are there any other dependencies on those ports 865 00:47:24,720 --> 00:47:28,740 for data from C or A? 866 00:47:28,740 --> 00:47:29,400 And it does. 867 00:47:29,400 --> 00:47:31,290 It finds that it has one other port, which 868 00:47:31,290 --> 00:47:35,370 has a dependency on B and A. So it invalidates the data 869 00:47:35,370 --> 00:47:39,890 on that port and waits until new data comes in from B, because D 870 00:47:39,890 --> 00:47:42,360 has enough information on the upstream topology 871 00:47:42,360 --> 00:47:48,240 to know that that new data will be coming in on the B port. 872 00:47:48,240 --> 00:47:52,200 And again, this ends up being a large complication 873 00:47:52,200 --> 00:47:55,890 for distributed control algorithms. 874 00:47:55,890 --> 00:47:58,890 But it more than pays off for itself, 875 00:47:58,890 --> 00:48:04,210 due to the reducing of the central controller bottleneck. 876 00:48:04,210 --> 00:48:06,180 So let's quickly talk about some details 877 00:48:06,180 --> 00:48:10,440 and what are the issues that one needs 878 00:48:10,440 --> 00:48:13,500 to be very careful about in building data flow 879 00:48:13,500 --> 00:48:17,340 environments such as this and, primarily, the execution 880 00:48:17,340 --> 00:48:18,660 environment. 881 00:48:18,660 --> 00:48:22,950 The first and paramount is memory usage. 882 00:48:22,950 --> 00:48:27,210 Data must be reference counted, meaning that when-- 883 00:48:27,210 --> 00:48:30,840 in that prior example, when A fires and sends the same data 884 00:48:30,840 --> 00:48:34,530 to B and C, A needs to know, or the system needs to know, 885 00:48:34,530 --> 00:48:36,480 when it can throw away that data. 886 00:48:36,480 --> 00:48:39,150 If A was to fire again, it needs to know 887 00:48:39,150 --> 00:48:41,610 that B and C have already consumed that data 888 00:48:41,610 --> 00:48:45,690 and that A can replace that data with a new copy. 889 00:48:45,690 --> 00:48:49,800 Memory usage in data flow systems is, by all means, 890 00:48:49,800 --> 00:48:52,410 the most difficult thing to work with. 891 00:48:52,410 --> 00:48:54,930 892 00:48:54,930 --> 00:48:57,990 As you saw in the prior example, there was both fan-out 893 00:48:57,990 --> 00:49:01,440 from-- the output of A was fanning out to two consumers, 894 00:49:01,440 --> 00:49:07,300 and D had a fan-in on one port from two producers. 895 00:49:07,300 --> 00:49:09,690 So you have to correctly deal with synchronization 896 00:49:09,690 --> 00:49:11,400 on fanning in and out. 897 00:49:11,400 --> 00:49:13,140 You have to eliminate the extra firings 898 00:49:13,140 --> 00:49:15,460 that we just talked about. 899 00:49:15,460 --> 00:49:18,900 And you have to minimize the context switching, 900 00:49:18,900 --> 00:49:23,520 if there is a central controller or if the entire data flow 901 00:49:23,520 --> 00:49:26,160 environment is all separate processes. 902 00:49:26,160 --> 00:49:30,750 And again, to reemphasize, the big problem 903 00:49:30,750 --> 00:49:32,580 in data flow environments is how do you 904 00:49:32,580 --> 00:49:35,790 minimize data motion, how do you make sure 905 00:49:35,790 --> 00:49:43,110 that no superfluous copies have been made, et cetera. 906 00:49:43,110 --> 00:49:46,710 In our prior example, it's potentially-- 907 00:49:46,710 --> 00:49:49,710 it's possible that A could retain a copy of the data 908 00:49:49,710 --> 00:49:54,120 that it sends to B, and B could retain a copy of that. 909 00:49:54,120 --> 00:49:55,880 C could also retain a copy of that. 910 00:49:55,880 --> 00:49:57,630 So you could potentially have three copies 911 00:49:57,630 --> 00:50:00,590 of data floating around in that environment, when, in fact, 912 00:50:00,590 --> 00:50:03,870 really only one is necessary, a shared version from all three 913 00:50:03,870 --> 00:50:04,630 of those. 914 00:50:04,630 --> 00:50:08,550 So you really have to minimize data copies and data motion. 915 00:50:08,550 --> 00:50:11,760 Finally, other issues such as cycles-- 916 00:50:11,760 --> 00:50:16,140 how is it possible to connect the output of one module 917 00:50:16,140 --> 00:50:20,700 back upstream to one of its ancestors' inputs, 918 00:50:20,700 --> 00:50:23,580 and how do you deal with termination sequences 919 00:50:23,580 --> 00:50:26,430 and initiation sequences for cycles? 920 00:50:26,430 --> 00:50:29,170 And that also gets back to bidirectional data flow, 921 00:50:29,170 --> 00:50:31,770 which is the last point. 922 00:50:31,770 --> 00:50:35,190 Did two modules communicate with each other 923 00:50:35,190 --> 00:50:37,320 in a bidirectional sense, or is data always 924 00:50:37,320 --> 00:50:38,670 flowing in one direction? 925 00:50:38,670 --> 00:50:41,730 Almost all the examples built so far 926 00:50:41,730 --> 00:50:45,360 have single, unidirectional data flow. 927 00:50:45,360 --> 00:50:48,780 So now, let's look at an example, 928 00:50:48,780 --> 00:50:54,270 one final example of the system that we're demonstrating today. 929 00:50:54,270 --> 00:51:01,110 Here is an initial data flow diagram, reading data 930 00:51:01,110 --> 00:51:02,385 from a tornado simulation. 931 00:51:02,385 --> 00:51:05,310 As we look in the bottom, in the lower right, 932 00:51:05,310 --> 00:51:12,540 we'll see this data set, which is a contour surface, 933 00:51:12,540 --> 00:51:15,520 and isosurface of a tornado. 934 00:51:15,520 --> 00:51:18,630 This kind of yellowish structure here 935 00:51:18,630 --> 00:51:24,570 is the locus of all points that have the same value 936 00:51:24,570 --> 00:51:25,800 within this data set. 937 00:51:25,800 --> 00:51:30,970 And on top of it is this blue, kind of fuzzy representation, 938 00:51:30,970 --> 00:51:38,050 which is buoyancy information, throughout the simulation. 939 00:51:38,050 --> 00:51:43,710 So what does it take to build this diagram, this application, 940 00:51:43,710 --> 00:51:48,160 and how do we customize that? 941 00:51:48,160 --> 00:51:51,330 So in this example, if we look at the upper right, 942 00:51:51,330 --> 00:51:54,310 there are three modules on the left-hand side, 943 00:51:54,310 --> 00:51:56,190 which are read data from disk. 944 00:51:56,190 --> 00:52:00,870 One, the module on the lower bottom, 945 00:52:00,870 --> 00:52:06,720 generates a color map to map the scalar field of buoyancy 946 00:52:06,720 --> 00:52:08,370 into color space. 947 00:52:08,370 --> 00:52:11,460 That then goes through a module which 948 00:52:11,460 --> 00:52:14,270 creates a volume biometric data set of that 949 00:52:14,270 --> 00:52:15,770 and sends that off to the renderer. 950 00:52:15,770 --> 00:52:19,760 On the top half is another module that 951 00:52:19,760 --> 00:52:24,080 reads another set of data from disk, creates the box 952 00:52:24,080 --> 00:52:27,800 wireframe that you saw a second ago, and then also 953 00:52:27,800 --> 00:52:34,370 that isosurface, that tan ISO-valued surface beneath it, 954 00:52:34,370 --> 00:52:36,260 and also sends out to the renderer. 955 00:52:36,260 --> 00:52:41,870 So for example, in this example, we 956 00:52:41,870 --> 00:52:46,640 want to demonstrate the system to show how you can collapse 957 00:52:46,640 --> 00:52:49,910 an entire application, this data flow diagram, 958 00:52:49,910 --> 00:52:53,780 into a smaller collection of modules, which can then 959 00:52:53,780 --> 00:52:57,170 be viewed in a much simpler context. 960 00:52:57,170 --> 00:53:00,440 So let's take three modules-- the isosurface, 961 00:53:00,440 --> 00:53:05,240 the wireframe, and the renderer-- group them together, 962 00:53:05,240 --> 00:53:08,210 such as this. 963 00:53:08,210 --> 00:53:11,780 Now we have a single group which contains output 964 00:53:11,780 --> 00:53:14,030 from those three. 965 00:53:14,030 --> 00:53:19,700 We'll edit that to build a user interface for the system 966 00:53:19,700 --> 00:53:24,080 by exposing parameters or widgets that we want 967 00:53:24,080 --> 00:53:29,570 to have controls on the system. 968 00:53:29,570 --> 00:53:32,630 We'll layout those controls into a new control 969 00:53:32,630 --> 00:53:39,440 panel, essentially a new face for the module, such as this. 970 00:53:39,440 --> 00:53:44,690 Here is the rendering window, followed by one control 971 00:53:44,690 --> 00:53:49,370 here, which controls the isosurface. 972 00:53:49,370 --> 00:53:53,460 We then apply those changes. 973 00:53:53,460 --> 00:53:55,880 And now we have a new module which 974 00:53:55,880 --> 00:53:59,150 looks like the previous one, has the same interaction 975 00:53:59,150 --> 00:54:05,370 paradigms as the previous example, 976 00:54:05,370 --> 00:54:11,220 and now allows us to interact with the output 977 00:54:11,220 --> 00:54:14,200 simulation in the same manner. 978 00:54:14,200 --> 00:54:16,080 So now let's talk about how does this 979 00:54:16,080 --> 00:54:19,230 apply to distributed heterogeneous environments. 980 00:54:19,230 --> 00:54:21,270 What happens when we want to have 981 00:54:21,270 --> 00:54:25,170 modules executing on a whole collection of machines? 982 00:54:25,170 --> 00:54:27,270 This means we now have to deal with different data 983 00:54:27,270 --> 00:54:30,780 representations, different floating point, different word 984 00:54:30,780 --> 00:54:33,810 lengths, different alignment, for that matter. 985 00:54:33,810 --> 00:54:36,840 We also have different shared resources now. 986 00:54:36,840 --> 00:54:42,270 We can no longer use simply shared memory communication 987 00:54:42,270 --> 00:54:45,300 for data on different machines, because there are no shared 988 00:54:45,300 --> 00:54:47,460 memory representations. 989 00:54:47,460 --> 00:54:51,750 And now we also have to worry about having brokers 990 00:54:51,750 --> 00:54:55,530 on remote machines that deal with transporting 991 00:54:55,530 --> 00:54:59,903 data, transporting control, or point-to-point operations. 992 00:54:59,903 --> 00:55:01,820 And also we still have to deal with this issue 993 00:55:01,820 --> 00:55:04,980 of, we have individual user interfaces, 994 00:55:04,980 --> 00:55:06,660 graphical user interfaces per module. 995 00:55:06,660 --> 00:55:10,140 How are those transported back to the local machine? 996 00:55:10,140 --> 00:55:12,270 If we look at this example, here is now 997 00:55:12,270 --> 00:55:15,060 a slightly more complex example from the other diagrams 998 00:55:15,060 --> 00:55:16,080 that we've seen. 999 00:55:16,080 --> 00:55:20,730 Module A and C are computing on your local workstation, 1000 00:55:20,730 --> 00:55:24,870 modules B and D are computed on some other machine, and E and F 1001 00:55:24,870 --> 00:55:28,570 on a third, remote machine. 1002 00:55:28,570 --> 00:55:31,980 Now in Explorer, modules all on the same machine 1003 00:55:31,980 --> 00:55:34,770 can communicate through shared memory, 1004 00:55:34,770 --> 00:55:38,380 therefore, minimizing data motion on a particular machine. 1005 00:55:38,380 --> 00:55:41,310 But, for example, the data that A communicates with B 1006 00:55:41,310 --> 00:55:43,890 must be transported across the machine, and a data 1007 00:55:43,890 --> 00:55:45,180 copy results. 1008 00:55:45,180 --> 00:55:46,860 Also some machines don't have shared 1009 00:55:46,860 --> 00:55:49,793 memory, machines like Cray. 1010 00:55:49,793 --> 00:55:51,460 Cray computers don't have shared memory. 1011 00:55:51,460 --> 00:55:56,400 So data copies are imperative between two modules 1012 00:55:56,400 --> 00:55:58,570 on the same machine. 1013 00:55:58,570 --> 00:56:01,020 So in reality, systems like this typically 1014 00:56:01,020 --> 00:56:04,260 have local agents and perhaps a central broker 1015 00:56:04,260 --> 00:56:08,430 that establishes communication between modules 1016 00:56:08,430 --> 00:56:11,010 and perhaps monitors data transmission. 1017 00:56:11,010 --> 00:56:15,125 So the blue boxes here, there is a separate process 1018 00:56:15,125 --> 00:56:16,500 which is a graph editor, which is 1019 00:56:16,500 --> 00:56:19,410 very important in maintaining interactivity 1020 00:56:19,410 --> 00:56:22,950 with the entire system, and then a central broker 1021 00:56:22,950 --> 00:56:26,760 on the local machine, which establishes local agents 1022 00:56:26,760 --> 00:56:27,660 on remote machines. 1023 00:56:27,660 --> 00:56:30,750 And those local agents are used for connecting two modules 1024 00:56:30,750 --> 00:56:37,830 together, killing a module, disconnecting two modules. 1025 00:56:37,830 --> 00:56:41,720 So let's conclude now by some discussion about, 1026 00:56:41,720 --> 00:56:45,860 does visual programming really work? 1027 00:56:45,860 --> 00:56:49,490 And it works in certain situations. 1028 00:56:49,490 --> 00:56:53,090 It works when the graphs are relatively simple 1029 00:56:53,090 --> 00:56:57,080 or if hierarchy is fully supported. 1030 00:56:57,080 --> 00:57:00,000 It also works when the datatyping is really effective, 1031 00:57:00,000 --> 00:57:01,400 meaning that typically this works 1032 00:57:01,400 --> 00:57:05,120 best for domain-specific, discipline-specific 1033 00:57:05,120 --> 00:57:06,200 applications. 1034 00:57:06,200 --> 00:57:08,570 This occurs in the computational sciences, 1035 00:57:08,570 --> 00:57:10,640 scientific visualization, image processing. 1036 00:57:10,640 --> 00:57:13,760 Some database systems have been built on visual programming. 1037 00:57:13,760 --> 00:57:15,483 When doesn't it work very well? 1038 00:57:15,483 --> 00:57:17,150 Well, it doesn't work very well for very 1039 00:57:17,150 --> 00:57:18,320 high-performance systems. 1040 00:57:18,320 --> 00:57:21,320 You'll never really be able to build a system that 1041 00:57:21,320 --> 00:57:23,930 will perform as high as a specialized application 1042 00:57:23,930 --> 00:57:29,090 by using data flow and visual programming. 1043 00:57:29,090 --> 00:57:31,580 It also does not work well for fixed functionality 1044 00:57:31,580 --> 00:57:32,210 applications. 1045 00:57:32,210 --> 00:57:35,330 Again, visual programming is best 1046 00:57:35,330 --> 00:57:38,840 used for applications where the configuration 1047 00:57:38,840 --> 00:57:41,270 of the application is not known by the original developer, 1048 00:57:41,270 --> 00:57:45,343 but is determined by the end user. 1049 00:57:45,343 --> 00:57:46,760 And finally, as a conclusion, what 1050 00:57:46,760 --> 00:57:49,520 are the limitations of visual programming? 1051 00:57:49,520 --> 00:57:53,060 Well, typically, it's difficult to scale a visual programming 1052 00:57:53,060 --> 00:57:55,550 graph to large applications. 1053 00:57:55,550 --> 00:57:59,650 And here, hierarchy is essential for any of this to work. 1054 00:57:59,650 --> 00:58:03,590 1055 00:58:03,590 --> 00:58:06,680 Once you have a large number of modules or atoms 1056 00:58:06,680 --> 00:58:10,340 that you're composing together, say in the thousands, 1057 00:58:10,340 --> 00:58:15,140 the component interface really becomes a limitation. 1058 00:58:15,140 --> 00:58:17,840 How do you find the module that you're really looking for? 1059 00:58:17,840 --> 00:58:21,680 You need a good component library paradigm. 1060 00:58:21,680 --> 00:58:23,990 You also, as I said before-- 1061 00:58:23,990 --> 00:58:25,610 the efficient memory utilization is 1062 00:58:25,610 --> 00:58:27,080 very important for data flow. 1063 00:58:27,080 --> 00:58:30,320 Is there one copy of data per module per port? 1064 00:58:30,320 --> 00:58:32,930 Is data shared between modules? 1065 00:58:32,930 --> 00:58:38,010 These become crucial issues in data flow environments. 1066 00:58:38,010 --> 00:58:39,980 And finally, the issue is, there's 1067 00:58:39,980 --> 00:58:43,790 always a difference between prototyping versus production. 1068 00:58:43,790 --> 00:58:47,150 While data flow and visual programming 1069 00:58:47,150 --> 00:58:50,570 is typically very good for this exploration process, 1070 00:58:50,570 --> 00:58:53,960 as exploratory, how do I combine these things 1071 00:58:53,960 --> 00:58:57,370 in order to meet this one particular need? 1072 00:58:57,370 --> 00:58:59,330 It sometimes is difficult to transition 1073 00:58:59,330 --> 00:59:04,640 from this exploration process and the very flexible needs 1074 00:59:04,640 --> 00:59:07,130 of that to very specific needs of production of, 1075 00:59:07,130 --> 00:59:09,890 now that I've got the image that I want, how do I get a thousand 1076 00:59:09,890 --> 00:59:11,670 more of them and fast? 1077 00:59:11,670 --> 00:59:13,850 So those are the primary issues that one 1078 00:59:13,850 --> 00:59:16,640 needs to deal with in terms of limitations 1079 00:59:16,640 --> 00:59:21,590 for visual programming and data flow environments. 1080 00:59:21,590 --> 00:59:23,360 Thank you. 1081 00:59:23,360 --> 00:59:26,710 [MUSIC PLAYING] 1082 00:59:26,710 --> 01:00:22,000