Objects are the main characters

Tag Archives: Bootstrap

In the last time I was working on the system bootstrap again, trying to enhance the process, lower the imposed limitations… And I have a pretty new version. I’ll present here a summary of what I’ve learnt in the last time:

A review on the process steps, with exemplar code snippets

Some of the new key ideas and new infrastructure

And improvements over the last version :)

What are we going to bootstrap

For the sake of simplicity, we will not bootstrap a full Pharo (because that would include preparing for example Morphic to be bootstrapped, or to be loaded/unloaded). Instead, we will bootstrap an adapted version of MicroSqueak from John Maloney, which we re-baptized as Pharo Candle. Pharo Candle has only 50 classes, as you can see in the Github repo, and limited features. All classes from Pharo Candle has a PC prefix, which is not important for the bootstrap, since it will not let be name collisions.

To understand the rest of the post, the following is important: we will take a textual and static definition of the system (methods, classes…), and load it into our image. For that we will parse the files and include them as definition objects. These definition objects will ease the access to the data needed instead of accessing lots of lookup and symbol tables and stuff in a procedural way…

Some infrastructure basics

The last bootstrap implementation was very coupled to Pharo and its internal implementation. To create objects, to initialize classes, we relied on the existing system, on sending messages, on the VirtualMachine interpreter and infrastructure. So the first step for this new version was to decouple that. Decoupling the existing system from the new system that is about to be created. To do that we encapsulate the state of the new system inside an object, which we call an object space.

An object space reifies the new system. It is an object that understand messages such as “create an object”, “translate this string to a string of the new system”, “register this object as a class in yourself” or even “execute this piece of code”. This object space lets us structure the creation of the new system in a more comfortable way.

Our next concern, is not installing objects from the existing system into the new system. We want the new system to be transitively closed. So, the best is not to have direct references from the existing system to the new one. The best is to control those references carefully. Then, every time an object space gives you a reference to one of his objects, he really gives you a proxy. That proxy allows you to manipulate, at some level, the object inside the object space. A proxy can give and revoke permissions on the inner object, and make transformations or validations when necessary to keep the model consistent. Since these proxys may perform meta-operations on the new system objects, we end up calling them mirrors at some point.

Finally there are VirtualMachine limitations when executing code on the new space. We introduced a new piece in the puzzle to overcome them: our own language-level interpreter. In this particular case, we are using an AST interpreter, but we could, if available, use any other kind of interpreter. The important thing about using our own interpreter is controlling the semantics of the new system, and leverage the VM’s limitations.

Where do we start?

When the bootstrap starts, there is nothing. There are no objects, no structures, nothing. So we have to build everything from the start. And the question is… where do we start?

Our system is composed by objects, and thus, we need to create objects. And every new object may have pointers, which are initially pointing to nil. But what if there is no nil in our system? So, let’s build a first nil, so all objects created later can point to this nil object.

Creating the first object

As we decided before, we create our first nil object. However, there is a question that arises when we try to create nil. How can we create an object without a class? The answer is, so far, that until we have classes we will create objects without classes. We will not care about their classes and we will solve that later, once classes are created. Fortunately, this problem is present with very few objects.

Since we have no class to create nil, we have to specify the format of this object. That is, the amount of memory to be allocated for it, the amount of slots, and if they are pointers or bytes or what. The format is known by the definition of undefined object.

Fortunately, nil has no pointers to other objects, except for his class, simplifying the process. We will set nil’s class once we create it later.

We create the classes

Creating a class is a complex operation. A class has a metaclass. And it has a name which is a symbol (unique in the system). And it has a superclass, which may have not been created yet. And it has a dictionary of class variables. And… a lot of stuff.

Our objective is to keep this bootstrap the simplest. And for that, we will delay all the complex operations to the moment when they are not so complex. In this step we create empty classes. We only initialize their format with a SmallInteger, and we let the rest of their pointers pointing to nil.

The first step for creating a class, is to create its metaclass. And to create a metaclass, we need the class Metaclass. This Metaclass, in a ST-80 like model, follows the Metaclass<->Metaclass class loop. That is, Metaclass is an instance of Metaclass class, and Metaclass class is an instance of Metaclass, as shown in blue in the following figure.

We create the first Metaclass and Metaclass class as objects without class, and then we make each one an instance of the other.

After this initial initialization is performed for every class, we finish by initializing the class variables. Class variables are represented by a Dictionary object. A dictionary object is an object with a complex structure, and the way to manipulate it depends on the nature of the system we are bootstrapping. The solution, so far, is to delegate the initialization of the dictionary to the dictionary itself.

For that we use a combination of the code of the dictionary and an AST interpreter. An AST interpreter needs to be initialized before its usage so later, all class variables can be initialized. As you can see in the code below, the AST interpreter usage is hidden inside the mirror implementation :).

Install methods

Now we have all classes of the new system created and initialized. We can start installing all their methods. Before, we should declare all global variables of the system, so the compiler knows how to bind them correctly. After that, we take the source code of all methods from the system description and compile them. The compilation gives us as result the bytecode of the method + the literals. This method is then translated to a method in the new world, and installed into the new system.

Then, for each class, we have the following code to create and install the methods:

"build the methods as instances of this system"
newMethods := aMethodBuilder
methodsForBehavior: mirror
fromDefinition: aBehaviorDefinition.
"create a method dictionary of the new system"
newMethodDict := objectSpace createMethodDictionary: newMethods size.
newMethods do: [ :m |
"install a method from this system to the other"
"the translation to a method to the other side is made inside"
newMethodDict installMethod: m
].
"we set the method dictionary to our class"
mirror methodDictionary: newMethodDict.

Initialize the system state

Finally, we initialize the system state, with the aid of the AST interpreter. This last step consists mainly in:

As some of you already know, the GSOC project edition 2012 is coming to an end. And along with it, the bootstrap project reaches a checkpoint. This post covers is the news since the last chapter, and discusses about the future steps. In a next post I’ll document the details of the project deliverables.

Where we where?

The first product of this project was a renewal of an image serializer: the SystemTracer. It takes an object graph and serializes it into the image format a vm works with. The System tracer was refactored reifying the memory object formats and updated to write Cog images.

The second step was to work on bootstrapping. And it was successful. Hazelnut, the bootstrap process tool, is able to build a smalltalk image from a description. To ensure the quality and health of these newly created images I set up jenkins jobs loading different packages on top of them, and running tests over them.

What happened since last time: Declarative kernel descriptions

A kernel definition has two main parts:

the code and definitions of the entities of that kernel

a definition on how to build the basic model of that kernel and how to initialize it finally

The first one can take the form of source files like in https://github.com/guillep/PharoKernel, which is the source code Hazelnut actually uses to bootstrap Pharo images. The second part of the kernel definition contains some imperative parts, and by now they are declared in simple pharo classes you download from Monticello.

So now, the bootstrap loads the kernel definition from those source files, generates the bootstrapped environment, and serializes it into a new image file.

The future of bootstrapping pharo

Since our goal is to bootstrap pharo to support it’s modularity and evolution, there are some keypoints to attack in the near future:

getting the pharo sourcecode in sync with this bootstrap representation

choosing the really important parts for a kernel. What should be and what should not in those source files? Where do we package what’s not going kernel?

building pharo from bootstrapped images.

Even, when looking at the upcoming pharo changes like first class slots and class layouts, or the new Tanker package manager, the bootstrap will need for sure some updates.

Conclusions

I hope this project makes pharo grow and get better! We can now generate images with the source code defined statically in source files, so for the GSOC program the scope has been fulfilled.

Last time we were generating a new image with useful information. This post tells the story after that first bootstrap: How we ensure that little monster is healthy, and how we ensure that our process is flexible and robust enough, and how does it help pharo in the modularization cruzade.

So, I want to work on building stuff on top of my little bootstrapped image. And I thought Fuel was a nice gun to attack that problem.

Detecting illnesses

In my last post about the bootstrap I’ve already shown you how to get a list of broken/uninitialized stuff. That gives us an idea of how healthy our image is. But there are other tools that we already use for that: tests.

So, Can we run SUnit on the bootstrap? Yes.

When I thought about Fuel and the bootstrap, I thought Fuel should be included by default, otherwise it should be difficult make it grow. Of course I could’ve chosen the compiler for the same purpose. But I’d like to enable modularization with binary packages.

So, what about exporting sunit as a binary package, and import it in the bootstrap? And what if we also export the tests over sunit, and include them also? Then we should be able to run the tests of sunit. Nice. Then this same idea can be applied to tests the kernel, or the compiler…

And I did it :)

Completeness (or “does it have all its essential parts?”)

Another thing we can think on is testing the completeness. When should you consider that the bootstrap is complete? A fun definition could be “when it is able to define itself”. Ok, let’s export the bootstrapping code with fuel, and import it in the bootstrapped image, and let’s try to bootstrap from the bootstrap. And do it again, just for fun.

Once the bootstrap bootstrap was working, I put the tests to work on the last generation of bootstraps.

The results

The test results are exported in JUnit format, so we can tell what’s broken and look at the stack trace. All this jobs are working on the bleeding edge of the project using latest Pharo image and latest CogVM.

So now that you now a bit what the bootstrap is about and what are some of the problems to face. I’ll show you some solutions and progress for real.

How does the bootstrap implementation work

To bootstrap a Smalltalk environment, you need to create a new environment, with it’s classes and objects, and initialize some state in them. I could have done that in C, using mallocs and initializing everything by hand using plain memory :). But doing it in smalltalk is easier: you have late binding, polymorphism, closures… Even, once you have done your first steps in the bootstrap, you can send messages to your objects. THAT is nice.

So, that is the way we cho0se to go: A new environment is created (a guest) into the current environment (the host). The guest will have it’s own classes and objects.

What about the special objects of the vm? We share them with the host environment, because if not, out new image will not be able to run… Afterwards, when this new image is written into an image file, we will swap references to point to our own new special objects array.

Here is a picture of how it looks like:

The Current Version

I’ve been through several versions of the image with different capabilities, sizes, correctness. You have to know, everything you forgot to initialize, or initialize in a wrong order, or if you took an extra object you do not need, you will have a not running image, or one that carries all the objects in the host also…

Another thing is that current version takes a sample of the objects in the host to build the guest. I’m already working on starting from scratch + source code, but that is the future and I like enjoying the present :).

So, how do you load the current code?

Also, as the code is very sensitive on what you do, it is also sensitive on what image you’re running it on. If you play with it in a wrong/different image, you will have different/unexpected results. So, I suggest you to use the same image as me for testing it: Latest Pharo 2.0. In particular, I’ve tested it on versions 20133 and 20134.

Do not scare when loading the configuration on Pharo 2.0 for the first time It will raise an error. It is an issue related with unzipping old mcz in Pharo (http://code.google.com/p/pharo/issues/detail?id=6054) which will fortunately fixed soon. Just close the debugger, try again, and get the project working.

And, How do I try these weird stuff?

With Nicolas Petton, we have written some examples in the class named HazelBuilderExamples.

To run them, you can try the following scripts:

"Writes an image which when opened prints a spaceTally on fileok.txt"
HazelBuilderExample new buildImageWithSpaceTally
"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt"
HazelBuilderExample new buildImageWithBrokenReferencesReport
"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt. The report is written in the xml format Jenkins Junit plugin likes."
HazelBuilderExample new buildImageWithBrokenReferencesReportForJenkins

After evaluating this code, you’ll have some bootstrapped image and changes files. Open that file with your CogVM and wait until it closes. Then have a look at the fileok.txt file :).

Of couse you can look at the code in the examples, and try to build your own. Email me if you have ideas to improve this :).

There are also probably some problems with file overwritting that came with some latest Pharo changes with the file management. Please, if you notice this, just remove the bootstrapped* files from the folder where your image is and try again.

Hey! How is this thingy useful?

Well, if you’ve had a look at the examples, the three examples I’ve shown you are very useful:

The first one tells you how the space is distributed in the new image. You can use this knowledge to attack space problems if you want an even smaller image.

The second one is used to detect bugs in the pharo Packaging or in the initialization process: You have not initialized some classes, or the have been initialized but the initialization code does not initialize all the variables.

Everything you want in life has a price connected to it. There’s a price to pay if you want to make things better, a price to pay just for leaving things as they are, a price for everything.

Harry Browne

And I found myself trying to bootstrap for real. But of course it was not going to be easy. I had to pay the Iron price.

The VM we use plays a very important role in the day to day development. It is the one in charge of defining the method lookup, garbage collection, some platform dependent code, some optimizations. And as it does some nice things for us, it also puts restrictions on what we do. Have you ever heard about the special objects array? The compact classes array? Primitives? We are going to talk a bit about them and other secrets, and how they bother in the bootstrap process :).

The Special Objects Array

The special objects array is an array shared between the VM and the image. This array points to some objects that are important or interesting to the VM, you can have a look at it inspecting the following expression in Pharo:

Smalltalk specialObjectsArray

If you have an overview, you will see some things like the Processor instance, the Array, Smalltalk, SmallInteger, Float, Compiled method, Semaphore classes, some Semaphore instances…

What does the vm do with them? It for example introduces hard validations against concrete classes -yeap, like checking if an instance’s class is the same as the object in it’s slot 20, which BTW is Character…

Ok…

Doing those validations through messages could be too expensive in terms of speed. If you want to be fast, you have to pay some price. If you want to have a tiny mermory footprint, you have to pay some price. There are side effects for decisions in general…

So, some may wonder, Why is this array so important for the bootstrap? Imagine I want to have a new Array class, Class class, Character class, and a new CompiledMethod class. What should happen if the VM does not recognize them as I would like? CogVM only recognizes one special objects array.

The solution? Hack and cheat. You choose, you can cheat on the VM side, or in the image side. Each has a price to pay. But today is not the day for telling you how I cheated :).

Now, look at the field 29 of the special objects array. It is another array, …

The Compact Classes Array

In two words, compact objects do not have an extra header for the class pointer: they have some bits in it’s base header which is an index into this nice compact classes array, where it’s class is. This mechanism is normally used for classes with tons of instances, saving 1 header for each object. Complexity against space.

Here again, we have the same problem. Even worse, having this guy here means that if I have my nice Array’ class, which is also compact, and it’s compact class index points to the original Array class, the method lookup will end up in the original class instead of mines :(.

The solution? Hack, and cheat.

The Primitives

Now think what happens when my bootstrap classes use primitives methods. It’s nice because the vm returns me the objects it wants :). It’s actually a sinptom of the last two points. But it is good to know that primitives can give you headaches…

Other problems? Of course…

Literals: I can’t change so easily SmallInteger’s class because it is an inmediate object for example. The same happens with the other numbers, or strings, or blocks. They all give you headaches. Even if I could make it work with the VM, I should change the compiler to use a different set of classes…

Vm magic assumptions: Like class instances’ first three instance variables are superclass, method dictionary and format. In that order. Try changing the order :). Or doing something like:

myA := A new.
A become: 'hello'.
myA crashTheVM.

And there are some other like this. So far I know LinkedList, ProcessScheduler, and Class. Find your own Waldo!

You already know what the solution is, do you?

HACK AND CHEAT

Yeah, this is what the bootstrap is really about. And learning hardcore stuff too :). Of course I can’t tell you every detail because this post will be larger than anyone would care to read.

So, I’ll keep you updated, I have now to continue paying the iron price.

A few months ago I got involved in some crazy project: bootstrapping Pharo. I took some existing code, played with it, hacked it, modified it, understood it. Now I think I have some idea of what is a bootstrap and what are it’s advantages. I’ll try to give a brief introduction to the project: what is it about, advantages, an overview of the current military secret results, and an insight of what is to come.

What is a Bootstrap

The encyclopedic definition: A Bootstrap of system is a process that can generate the smallest subset of that system that may be used to reproduce the complete one.

I mean, you have an explicit process that can generate a the minimal version of your system.

Ok, easier: You kick yourself to get impulse and start from a better place :).

Then, bootstrapping software systems or languages normally means that you will somehow enhance the original process that created the system.

Some examples to clarify:

When your computer is turned on, some one has to bootstrap the little program able to load other programs :).

Generating a development environment with very basic tools will improve your work a lot (when you use your development environment).

C compiler is written in C. That means that It somehow compiles itself. Of course if you do not like assembly much, reading the C implementation is a great improvement.

Pharo implements traits and uses them in the core of the system to empower the design.

A big part of Pharo’s VM is written in smalltalk!

The need of bootstrapping Pharo

Did you know the image you download from the pharo website is the same one that comes from years ago? I mean, not the exactly same one, but a binary copy :). The fact is that years ago (yeah, ancient history) god created the first image and it started evolving, one little change after the other, to the Pharo we know today.

Now, as in evolution, we found in Pharo missing links. We do not know how some object became the way it is. Or how it was initialized. The code is simple nowhere, it’s a missing link. Also, as years passed, our ancestor became chaotic. It grew in many different uncontrolled and unordered ways. Since the Pharo Project started, one if it’s goals was cleaning this mess, but re-modularising and cleaning the system is a hard, long, and bothersome process.

The outputs and advantages of Bootstrapping Pharo will be:

getting tools to detect problems: bad dependencies, unexistent initializations, code that really do not work but was never executed before.

This initialization process will be explicit and open.

We will be able to start the next Pharo from scratch, and since we will be able to change this explicit process, our next generation Pharo will be cleaner and fancier. It will be able to acquire easily new features: namespaces/modules, security, remote tools, mirrors.

But also, since it will allow people to create a custom system, researchers will have an invaluable tool to fulfil their own purposes. They will be able to experiment

Current status of the project

The project has already had a first output, which is the image writer. The image writer is a little tool that traces a graph from an SmalltalkImage object, and deploys that graph into a .image file. I’ll talk about this sub project in a future post.

The rest of the code is a military secret yet. Ok, I can give it to you, but you have to be responsible if it blows up on your face :).

The results of the project so far are:

It can create an Smalltalk image living inside another image.

This inner/guest image can be written in a new .image file.

With this approach a small kernel of 1.1MB has been reached.

SpaceTally runs and prints reports to understand how the space is distributed among objects.

A tool to detect every uninitialized class variable/class instance variable and references to unexistant globals was developed.

Soon we will have all these public on Pharo Jenkins server.

Next Steps

Jenkins jobs :)

Taking jenkins feedback to speed up the cleaning process

Remodularizaton to get even a smaller kernel

Maybe some little experiment to bootstrap MicroSqueak and learn from it

Bootstrap from source code.

I’ll try to keep you updated often. BTW, any ideas, critics or suggestions are welcome. I’m not a gurú, I’m just learning, as everyone :).