In this project-centered course you will build a modern software hierarchy, designed to enable the translation and execution of object-based, high-level languages on a bare-bone computer hardware platform. In particular, you will implement a virtual machine and a compiler for a simple, Java-like programming language, and you will develop a basic operating system that closes gaps between the high-level language and the underlying hardware platform. In the process, you will gain a deep, hands-on understanding of numerous topics in applied computer science, e.g. stack processing, parsing, code generation, and classical algorithms and data structures for memory management, vector graphics, input-output handling, and various other topics that lie at the very core of every modern computer system.
This is a self-contained course: all the knowledge necessary to succeed in the course and build the various systems will be given as part of the learning experience. The only prerequisite is knowledge of programming at the level acquired in introduction to computer science courses. All the software tools and materials that are necessary to complete the course will be supplied freely after you enrol in the course.
This course is accompanied by the textbook "The Elements of Computing Systems" (Nisan and Schocken, MIT Press). While not required for taking the course, the book provides a convenient coverage of all the course topics. The book is available in either hardcopy or ebook form, and MIT Press is offering a 30% discount off the cover price by using the discount code MNTT30 at https://mitpress.mit.edu/books/elements-computing-systems.
The course consists of six modules, each comprising a series of video lectures, and a project. You will need about 2-3 hours to watch each module's lectures, and about 15 hours to complete each one of the six projects. The course can be completed in six weeks, but you are welcome to take it at your own pace. You can watch a TED talk about this course by Googling "nand2tetris TED talk".
*About Project-Centered Courses: Project-centered courses are designed to help you complete a personally meaningful real-world project, with your instructor and a community of learners with similar goals providing guidance and suggestions along the way. By actively applying new concepts as you learn, you’ll master the course content more efficiently; you’ll also get a head start on using the skills you gain to make positive changes in your life and career. When you complete the course, you’ll have a finished project that you’ll be proud to use and share.

Enseigné par

Shimon Schocken

Professor

Transcription

So welcome to Unit 5.8, in which we are going to talk about arrays. To remind you, we are developing the code generation capabilities of our compiler. We already handled variables, expressions, flow of control, and objects, and in this unit we are going to discuss how the compiler generates code for handling arrays. Now, as usual, we're going to split this conversation into additional sub-conversations and we'll discuss how to construct arrays and then how to manipulate them. So let us begin with the array construction. Here's a map of our host RAM, repeating something that you've seen by now several times in the course. And let us assume that the high-level programmer writes var Array arr. So basically, the programmer is declaring an array called arr. Well, in response to this, the compiler is going to do something very humble. It will only allocate a local variable to represent this array, and in Jack, it will also initialize this variable to 0, and that's it. Now, it is possible that later on in the program, you will also like to construct this array. So you do this by calling the new subroutine of the Array class. And if you do this, then by some sort of magic that we're going to discuss when we write the operating system, the compiler, together with the operating system, will allocate sufficient space in the heap to represent this array. And the base address of this new allocated block, whichever this address may be, is going to be stored in local 0. Now, in this example, I have invented or contrived the address 8054. It makes, it's just an arbitrary example. So let us review how the compiler, or the code generation part of the compiler is going to handle these two statements here. So, beginning with var Array arr. What will the compiler do with this statement? Well, it will do very little. First of all, this statement generates no code whatsoever. And the only thing which is going to be affected is the symbol table. So the compiler is going to add one line to the symbol table. This line will say that we have a variable called local 0, and this variable is of type Array, and the name of this variable is arr. That's it. That's how we're going to handle this array declaration statement. What about the array construction? Well, if you look at this statement here, you will realize that all we have here is a standard call to a Jack subroutine. And this is something that we already know how to handle. We talked about handling methods and functions and so on. We know how to generate code to do this. We have to push n and then call new for its effect. So there's nothing new here. And as far as the caller is concerned or as far as the complier is concerned, when it compiles this caller's code, it will simply generate code for handling a regular sort of plain vanilla call to some subroutine for its effect, and that's it. So that's what we have to say about array construction. And as you see, there's very little work to do here. All right, moving along, the whole gist of this module or unit, I'm sorry, is actually in array manipulation, and that's what we're going to do next. Before we talk about array manipulation, I wish to remind you about two virtual segments that we have in our VM architecture called this and that. If you recall, these two segments are kind of portable in the sense that we can park them anywhere we want in the RAM, and then we can use these segments to manipulate the RAM in some desired addresses. But the mechanisms of achieving this are kind of hairy. So we'd like to refresh your memory about how we actually do it. So, and before I do this, I'd like to remind you that we also have the two pointers this and that with capital THIS and capital THAT. You see them in locations 3 and 4 in the RAM. And these two pointers hold the base addresses of this and that. And if you want to park or anchor this in a particular location in the RAM, for example, in the arbitrarily chosen address 8058, then as a programmer, what you do is you take this number, 8058, and you put it into the this pointer. Once you do this, the VM implementation is going to align the this mutual segment with address 8058. What about the that segment? It's exactly the same story, only you have to use the that pointer instead of this. Now, in general, one can use this and that for any purpose that one chooses. But if you write a VM translator, then we recommend that you use this to represent the values of the fields of the current object and you use that to represent the values of the current array. And in order to do this, once again, we use the this and that pointers in order to align this and that wherever you want in the RAM. And the way to do it the way to create this alignment, is to use a certain software mechanism that we made up when we designed the VM language. In particular, we used yet another virtual segment called pointer. Now, pointer is one of the eight virtual segments that are available to our disposal. This is a very special segment, because it has only two entries, 0 and 1. The 0 entry represents THIS in capital T-H-I-S, and the 1 entry represents capital, T-H-A-T, THAT. So if you want to map this on a particular address in the RAM, you put this address on the stack, and then you pop it onto pointer 0. And the VM implementation is going to Take, this segment, and kind of align it with the desired address, and the same is true for that. So, this is how we use, or can use this and that in general, and with that, literally speaking, let us take this capability and talk about how we can use it to actually access the RAM. So here's a specific example, which may well make the previous discussion unnecessary in case it only managed to confuse you. So as usual, seeing an example is quite illuminary. So let us suppose that we want to put the number 17 in this arbitrary address in the RAM, 8056. How can we do it? Well we can push this value onto the stack, and then we can pop, we can push the address, I'm sorry, onto the stack, and pop it onto pointer1. Once we do it, the VM implementation will effect the pop, and we'll take this value and put into VET. Now take a look at the RAM, and you will notice that the VET pointer now contains 8856. There's an immediate side effect. The implementation will align that 0 with the desired address, so we'll get this setting right here. And once we do this we can start manipulating the VET segment. In particular we can push 17 onto the stack. That's the number we wanted to put into the desired address and pop it into that 0. Once we do this the VM implementation will affect the pop. It will put the implementation into that 0. And is an extremely important side effect visually speaking. It will set the actual address in the RAM to the desired value to 17. And with that in mind we can now move on and talk about how we can use this logic and generalize it in order to achieve array address. And that's what we'll do now. All right so we look once again at this array mapping on the RAM. Let us assume that this is the current situation. And let's say that the high level jack programmer decided to put the number 17 in the third entry of arr. How should the compiler generate code to make this happen? Well, here's what we can do based on what we did, just a few seconds ago. We can push arr which happens to contain the base address of this array, we push it onto the stack. Then, we push the index that we want to effect onto the stack, we can add them up. And once we add them up we store the result, or we pop the result, onto pointer 1. Once we do this, the number 856 which happens to be the base address plus 2 will be stored in the THAT pointer. The THAT virtual segment will be aligned with the address that we want to manipulate, and at this stage, we can actually affect the assignment. We can push 17 and pop it onto that zero. As a result, the VM implementation will put the number 17 in that 0 as well as in via RAM in the respective address. Now, there are two interesting observations that I would like to make about this example here. Well first of all, notice that we use only that 0. We don't use any other entries from the that segment. Unlike the this segment, that we used when we manipulated objects, where a 0 stood for the first field value, 1 stood for the second field value, 3 stood for so is 0 well you know what I'm talking about right? We used that entries to represent the values of the field of the current object. And yet in that when we want to manipulate a particular index in any given array we always do something like this. Okay we always manipulate that zero only and we really don't use anywhere if I recall correctly we never use other entries in the THAT segment. If this is not completely clear to you it will become clear when you see some more examples so that's the first technical observation. The second observation is somewhat philosophical and much more important. Notice that the VM code that you have in front of you. Knows absolutely nothing about the whole strand. He doesn't know where the array is located in the RAM. He doesn't know which addresses we are manipulating. It operates in a completely symbolic logical world. The world of the VM. In that respect, it's a very safe code. The code cannot kind of reach out of the VM world and mess up with the environments that are outside the virtual machine. Because all the physical considerations are being implemented or handled by the VM implementation by the VM translator. And the VM code itself, is completely oblivious of the host platform. This is super important, not only because we want the code to be safe, but also, because in a world consisting of numerous different computers, laptops, tablets, and cell phones. Digital watches, and whatnot, we don't know where this code is going to run. It can run on many different platforms. And we want the same code to run on all these different platforms, and therefore, we don't have the luxury of making any assumptions about the underlying hardware platform. So, I hope that this distinction here is not lost on you because it's very important. That's one of the most fundamental virtues of working with Virtual Machines. The fact that you can be oblivious of the surrounding hardware platform. All right, so let's take this and generalize it into array axis in general. And let us assume that we have to handle the general statement, array expression 1 equals expression 2. Because the index of the array can be something like arr 17 times x plus calling some subroutine. The expression can be As elaborate as you please and likewise expression two can be quite elaborate as you please. How do we handle it? Well, following the example that we just saw, we push arr then we evaluate expression one using the VM We generate code for evaluating expression one. We push the result onto the stack, we add them up, we take the result, we pop it onto point one, just like we did before. And then, we evaluate and push the result of expression two and pop it onto that zero. This should deliver the day and at first glance it looks perfectly okay. Unfortunately, there's a problem and this code will not work. Let me show you why. So here's what will happen, and Let me illustrate the general problems suppose we want to compile a[i] = b[j]. Well, following what we did before we push a, push i, add, pop the result into pointer 1. And this will save the, this will I'm sorry align that zero with the ai. And then, we turn to the right-hand side. We push b, push j, add, and again we pop into pointer 1. And that's where we're going to get a problem because this assignment here into pointer one is going to mess up and override the address that we stored previously in pointer one. So this was simply not work out and we need a better solution. So we have to be more clever about it and that's what we'll do next. So here is a solution that will work to generate code for a[i] = b[j]. We start just as before, by pushing a, pushing i. And adding them up. And once we do this, we turn to the right hand side, we do the same. Push b, push j, add. Notice that we haven't yet used the pointer virtual segment. And once we do these six operations, what we'll get is the following state. The stack is going to contain these two addresses. The RAM address of a[i] and the RAM address of b[j]. And then, we're going to pop the topmost value which is the base address of b[j] onto pointer one, so we're going to get the state that you see here. And then, now that that zero is aligned with b[j] we push the value that we find in b[j], and pop it into temp 0. So this is something that we haven't done before in the course I think we haven't used the temp segment yet so now you see an example of why we need it. So we use temp as a temporary variable that now contains the value of b[j], and this will be the picture of the VM segments after state two. Well, now that we did all that, we're actually home, and free, because we can now take this address of a[i] and pop it onto pointer 1 safely because we saved, we attempt to save this value. And once we do this, the stack will be empty and pointer 1 is going to contain the base address of a[i]. That 0 is going to be aligned perfectly well with the a[i] so I can proceed push temp 0 which is the value of b[j] onto the stock and then unload it or pop it onto that 0 which is a[i] on the address of a[i]. And this will actually accomplish what I need. So this solution, unlike the previous one, will always work. It is bullet proof, and that's how I'm going to generate code to handle array access. To generalize this example here, if we have to generate code for this high level, generic statement we have push arr, then we generate code to evaluate and push expression one we have them up, then we generate and evaluate code for and push the result of expression two on the the stack. We pop temp 0. This is the big innovation that we just introduced, and once we do this, we pop pointer 1, push temp 0, pop that 0, and this will generate the necessary code to handle the top high level statement that you see here. Now, if needed, expression two can be as complex, exotic and interesting as you please. It can handle point one, it doesn't matter. Because we brought the tenth segment to the rescue and this pattern here will deliver exactly what we want. What about the complex array references like this one? Well, it turns out that the solution that we have here is sufficiently general and it will actually deliver the goods. And it doesn't matter how complex your array reference is. This code or this solution is going to handle it. And so, this pretty much wraps up what I wanted to say about accessing arrays. And so, we know how to declare arrays, we know how to access them. I mean, we know how to generate code that supports array declaration and manipulation. And so, that's the end of this unit on handling arrays, and in the next unit we're going to change the subject and talk about something called standard mapping over the Virtual Machine.