Objects are the main characters

Tag Archives: Smalltalk

This little post is to add a summary of what I’ll be showing in the upcoming ESUG conference. I’ll present a paper about my work in the workshop and present the recovery tools of Oz in the innovation awards :).

For the paper

I’ll present the ideas and implementation of Oz object spaces. Especially the metacircular problems it solves, and a bit on the vision we (I and my supervisors have on this line of work). If you want to read the paper, you can go to:

Reﬂective architectures are a powerful solution for code browsing, debugging or in-language process handling. However, these reﬂective architectures show some limitations in edge cases of self-modiﬁcation and self-monitoring. Modifying the modiﬁer process or monitoring the monitor process in a reﬂective system alters the system itself, leading to the impossibility to perform some of those tasks properly. In this paper we analyze the problems of reﬂective architectures in the context of image based object-oriented languages and solve them by providing a ﬁrst-class representation of an image: a virtualized image.
We present Oz, our virtual image solution. In Oz, a virtual image is represented by an object space. Through an object space, an image can manipulate the internal structure and control the execution of other images. An Oz object space allows one to introspect and modify execution information such as processes, contexts, existing classes and objects. We show how Oz solves the edge cases of reﬂective architectures by adding a third participant, and thus, removing the selfmodiﬁcation and self-observation constraints.

For the innovation awards

I’ve prepared a demo and with that I recorded a little video showing it. You can see it in:

As you may already know, ZeroConf scripts are bash scripts that ease the installation of a Pharo environment. A funny thing about these ZeroConf scripts is that they are seen as bash scripts by a bash terminal, and as simple and minimal html pages by a web browser. These scripts are extensively used to simplify the configuration of Pharo CI jobs. They allow you to easily download many versions of the Pharo image and VM.

As I’m working for my Phd, and have a custom version of my virtual machine and image, and also want to make use of the advantages our CI server provides, I wanted to build my own ZeroConf scripts specialized for my needs. I also heard recently on the pharo mailing list that there was some work on customizing ZeroConf scripts[1] for Moose[2]. So I wanted to do it as well for my project :).

Downloading the ZeroConf package

The ZeroConf scripts are generated automatically by the ZeroConf Pharo package. You can find this package in [2]. To download the current version, you just have to execute the following piece of code in a workspace:

Getting what’s inside ZeroConf

The ZeroConf package is pretty small and simple. There is an abstract class AbstractZeroConfBashScript implementing most of the script generation and bash writing utils. Its subclasses will implement the concrete script generation. Current implementation includes three main classes below the AbstractZeroConfBashScript hierarchy, implementing a composite pattern:

Finally, I created a convenience method for creating a script corresponding to the 1.0 version of my custom image.

OzZeroConfImageScript class>>oz10
^self new
release: '1.0';
yourself

Now you can try generating your script in a workspace,

OzZeroConfImageScript oz10 generate

and see the generated results in your working directory!

Customizing a vm script

A custom vm script is defined by a subclass of ZeroConfVMScript. ZeroConfVMScript defines, as its image friend, some vm common behavior, such as the release number and virtual machine type(i.e., if it is a jitted vm or not), which we will use in our script.

Integrating everything and automating generation

As I already showed you, every script understands the message #generate to generate itself. However, we may want to generate many scripts, and combine them. The ZeroConf infrastructure already provides for that the ZeroConfCommandLineHandler. The ZeroConfCommandLineHandler is a command line handler that knows which are the scripts we want to generate, combines them appropriately and generates them. So we will subclass from ZeroConfCommandLineHandler and specialize it to fulfill our needs.

As some of you already know, the GSOC project edition 2012 is coming to an end. And along with it, the bootstrap project reaches a checkpoint. This post covers is the news since the last chapter, and discusses about the future steps. In a next post I’ll document the details of the project deliverables.

Where we where?

The first product of this project was a renewal of an image serializer: the SystemTracer. It takes an object graph and serializes it into the image format a vm works with. The System tracer was refactored reifying the memory object formats and updated to write Cog images.

The second step was to work on bootstrapping. And it was successful. Hazelnut, the bootstrap process tool, is able to build a smalltalk image from a description. To ensure the quality and health of these newly created images I set up jenkins jobs loading different packages on top of them, and running tests over them.

What happened since last time: Declarative kernel descriptions

A kernel definition has two main parts:

the code and definitions of the entities of that kernel

a definition on how to build the basic model of that kernel and how to initialize it finally

The first one can take the form of source files like in https://github.com/guillep/PharoKernel, which is the source code Hazelnut actually uses to bootstrap Pharo images. The second part of the kernel definition contains some imperative parts, and by now they are declared in simple pharo classes you download from Monticello.

So now, the bootstrap loads the kernel definition from those source files, generates the bootstrapped environment, and serializes it into a new image file.

The future of bootstrapping pharo

Since our goal is to bootstrap pharo to support it’s modularity and evolution, there are some keypoints to attack in the near future:

getting the pharo sourcecode in sync with this bootstrap representation

choosing the really important parts for a kernel. What should be and what should not in those source files? Where do we package what’s not going kernel?

building pharo from bootstrapped images.

Even, when looking at the upcoming pharo changes like first class slots and class layouts, or the new Tanker package manager, the bootstrap will need for sure some updates.

Conclusions

I hope this project makes pharo grow and get better! We can now generate images with the source code defined statically in source files, so for the GSOC program the scope has been fulfilled.

Wait, WTF? How is that ProtoObject superclass is nil? Wait again, and the one of it’s superclass is Class? Metaclass class is an instance of Metaclass? Hey, that’s kind of the chiken and the egg problem, which one was first?

You know that when you create a class, you specify a superclass for it. This superclass will specify some other properties and the VM will use it to perform the method lookup.

Also, probably you already know that when a class is created in Smalltalk, a metaclass is created for it implicitly. That metaclass describes the class side behavior: class side methods, class instance variables…

Funny thing about this implicit metamodel, is that a second class hierarchy is built in parallel to the original class hierarchy.

Now, if you think about this, you can understand why the method lookup works also in the class side methods, and they are not static like in Java or C# :).

Which of course have it’s exceptions. The method lookup ends when it reaches a class whose superclass is nil. And the class side objects also behave like a Class, because they finally inherit from Class. HA! But then the metaclass hierarchy re-enters the non-metaclass hierarchy. Thinking of this in an operational way is kind of meta confusing, isn’t it?

But this is not the motivation of this post. The motivation is this: Are we really coupled to that meta model? How can I create my own?

Once you have a metaclass, instantiate it to create your class, and instanciate it to create your little object! That’s crafting Smalltalk using Smalltalk. Well, that is bootstrapping the meta model :).
The only ugly thing is that in order to create a new meta model with different instance variables, you have to create a transient class in the middle, because the VM does not like to have objects with a format X, whose class defines a format Y… So the hack just solves the format problem :).

Now you can think about simpler stuff like a class instance of itself, subclass of nil. Or more complex one :).

So now that you now a bit what the bootstrap is about and what are some of the problems to face. I’ll show you some solutions and progress for real.

How does the bootstrap implementation work

To bootstrap a Smalltalk environment, you need to create a new environment, with it’s classes and objects, and initialize some state in them. I could have done that in C, using mallocs and initializing everything by hand using plain memory :). But doing it in smalltalk is easier: you have late binding, polymorphism, closures… Even, once you have done your first steps in the bootstrap, you can send messages to your objects. THAT is nice.

So, that is the way we cho0se to go: A new environment is created (a guest) into the current environment (the host). The guest will have it’s own classes and objects.

What about the special objects of the vm? We share them with the host environment, because if not, out new image will not be able to run… Afterwards, when this new image is written into an image file, we will swap references to point to our own new special objects array.

Here is a picture of how it looks like:

The Current Version

I’ve been through several versions of the image with different capabilities, sizes, correctness. You have to know, everything you forgot to initialize, or initialize in a wrong order, or if you took an extra object you do not need, you will have a not running image, or one that carries all the objects in the host also…

Another thing is that current version takes a sample of the objects in the host to build the guest. I’m already working on starting from scratch + source code, but that is the future and I like enjoying the present :).

So, how do you load the current code?

Also, as the code is very sensitive on what you do, it is also sensitive on what image you’re running it on. If you play with it in a wrong/different image, you will have different/unexpected results. So, I suggest you to use the same image as me for testing it: Latest Pharo 2.0. In particular, I’ve tested it on versions 20133 and 20134.

Do not scare when loading the configuration on Pharo 2.0 for the first time It will raise an error. It is an issue related with unzipping old mcz in Pharo (http://code.google.com/p/pharo/issues/detail?id=6054) which will fortunately fixed soon. Just close the debugger, try again, and get the project working.

And, How do I try these weird stuff?

With Nicolas Petton, we have written some examples in the class named HazelBuilderExamples.

To run them, you can try the following scripts:

"Writes an image which when opened prints a spaceTally on fileok.txt"
HazelBuilderExample new buildImageWithSpaceTally
"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt"
HazelBuilderExample new buildImageWithBrokenReferencesReport
"Writes an image which when opened prints a report on all the packaging/initialization deficiencies of the image on fileok.txt. The report is written in the xml format Jenkins Junit plugin likes."
HazelBuilderExample new buildImageWithBrokenReferencesReportForJenkins

After evaluating this code, you’ll have some bootstrapped image and changes files. Open that file with your CogVM and wait until it closes. Then have a look at the fileok.txt file :).

Of couse you can look at the code in the examples, and try to build your own. Email me if you have ideas to improve this :).

There are also probably some problems with file overwritting that came with some latest Pharo changes with the file management. Please, if you notice this, just remove the bootstrapped* files from the folder where your image is and try again.

Hey! How is this thingy useful?

Well, if you’ve had a look at the examples, the three examples I’ve shown you are very useful:

The first one tells you how the space is distributed in the new image. You can use this knowledge to attack space problems if you want an even smaller image.

The second one is used to detect bugs in the pharo Packaging or in the initialization process: You have not initialized some classes, or the have been initialized but the initialization code does not initialize all the variables.

Everything you want in life has a price connected to it. There’s a price to pay if you want to make things better, a price to pay just for leaving things as they are, a price for everything.

Harry Browne

And I found myself trying to bootstrap for real. But of course it was not going to be easy. I had to pay the Iron price.

The VM we use plays a very important role in the day to day development. It is the one in charge of defining the method lookup, garbage collection, some platform dependent code, some optimizations. And as it does some nice things for us, it also puts restrictions on what we do. Have you ever heard about the special objects array? The compact classes array? Primitives? We are going to talk a bit about them and other secrets, and how they bother in the bootstrap process :).

The Special Objects Array

The special objects array is an array shared between the VM and the image. This array points to some objects that are important or interesting to the VM, you can have a look at it inspecting the following expression in Pharo:

Smalltalk specialObjectsArray

If you have an overview, you will see some things like the Processor instance, the Array, Smalltalk, SmallInteger, Float, Compiled method, Semaphore classes, some Semaphore instances…

What does the vm do with them? It for example introduces hard validations against concrete classes -yeap, like checking if an instance’s class is the same as the object in it’s slot 20, which BTW is Character…

Ok…

Doing those validations through messages could be too expensive in terms of speed. If you want to be fast, you have to pay some price. If you want to have a tiny mermory footprint, you have to pay some price. There are side effects for decisions in general…

So, some may wonder, Why is this array so important for the bootstrap? Imagine I want to have a new Array class, Class class, Character class, and a new CompiledMethod class. What should happen if the VM does not recognize them as I would like? CogVM only recognizes one special objects array.

The solution? Hack and cheat. You choose, you can cheat on the VM side, or in the image side. Each has a price to pay. But today is not the day for telling you how I cheated :).

Now, look at the field 29 of the special objects array. It is another array, …

The Compact Classes Array

In two words, compact objects do not have an extra header for the class pointer: they have some bits in it’s base header which is an index into this nice compact classes array, where it’s class is. This mechanism is normally used for classes with tons of instances, saving 1 header for each object. Complexity against space.

Here again, we have the same problem. Even worse, having this guy here means that if I have my nice Array’ class, which is also compact, and it’s compact class index points to the original Array class, the method lookup will end up in the original class instead of mines :(.

The solution? Hack, and cheat.

The Primitives

Now think what happens when my bootstrap classes use primitives methods. It’s nice because the vm returns me the objects it wants :). It’s actually a sinptom of the last two points. But it is good to know that primitives can give you headaches…

Other problems? Of course…

Literals: I can’t change so easily SmallInteger’s class because it is an inmediate object for example. The same happens with the other numbers, or strings, or blocks. They all give you headaches. Even if I could make it work with the VM, I should change the compiler to use a different set of classes…

Vm magic assumptions: Like class instances’ first three instance variables are superclass, method dictionary and format. In that order. Try changing the order :). Or doing something like:

myA := A new.
A become: 'hello'.
myA crashTheVM.

And there are some other like this. So far I know LinkedList, ProcessScheduler, and Class. Find your own Waldo!

You already know what the solution is, do you?

HACK AND CHEAT

Yeah, this is what the bootstrap is really about. And learning hardcore stuff too :). Of course I can’t tell you every detail because this post will be larger than anyone would care to read.

So, I’ll keep you updated, I have now to continue paying the iron price.

A few months ago I got involved in some crazy project: bootstrapping Pharo. I took some existing code, played with it, hacked it, modified it, understood it. Now I think I have some idea of what is a bootstrap and what are it’s advantages. I’ll try to give a brief introduction to the project: what is it about, advantages, an overview of the current military secret results, and an insight of what is to come.

What is a Bootstrap

The encyclopedic definition: A Bootstrap of system is a process that can generate the smallest subset of that system that may be used to reproduce the complete one.

I mean, you have an explicit process that can generate a the minimal version of your system.

Ok, easier: You kick yourself to get impulse and start from a better place :).

Then, bootstrapping software systems or languages normally means that you will somehow enhance the original process that created the system.

Some examples to clarify:

When your computer is turned on, some one has to bootstrap the little program able to load other programs :).

Generating a development environment with very basic tools will improve your work a lot (when you use your development environment).

C compiler is written in C. That means that It somehow compiles itself. Of course if you do not like assembly much, reading the C implementation is a great improvement.

Pharo implements traits and uses them in the core of the system to empower the design.

A big part of Pharo’s VM is written in smalltalk!

The need of bootstrapping Pharo

Did you know the image you download from the pharo website is the same one that comes from years ago? I mean, not the exactly same one, but a binary copy :). The fact is that years ago (yeah, ancient history) god created the first image and it started evolving, one little change after the other, to the Pharo we know today.

Now, as in evolution, we found in Pharo missing links. We do not know how some object became the way it is. Or how it was initialized. The code is simple nowhere, it’s a missing link. Also, as years passed, our ancestor became chaotic. It grew in many different uncontrolled and unordered ways. Since the Pharo Project started, one if it’s goals was cleaning this mess, but re-modularising and cleaning the system is a hard, long, and bothersome process.

The outputs and advantages of Bootstrapping Pharo will be:

getting tools to detect problems: bad dependencies, unexistent initializations, code that really do not work but was never executed before.

This initialization process will be explicit and open.

We will be able to start the next Pharo from scratch, and since we will be able to change this explicit process, our next generation Pharo will be cleaner and fancier. It will be able to acquire easily new features: namespaces/modules, security, remote tools, mirrors.

But also, since it will allow people to create a custom system, researchers will have an invaluable tool to fulfil their own purposes. They will be able to experiment

Current status of the project

The project has already had a first output, which is the image writer. The image writer is a little tool that traces a graph from an SmalltalkImage object, and deploys that graph into a .image file. I’ll talk about this sub project in a future post.

The rest of the code is a military secret yet. Ok, I can give it to you, but you have to be responsible if it blows up on your face :).

The results of the project so far are:

It can create an Smalltalk image living inside another image.

This inner/guest image can be written in a new .image file.

With this approach a small kernel of 1.1MB has been reached.

SpaceTally runs and prints reports to understand how the space is distributed among objects.

A tool to detect every uninitialized class variable/class instance variable and references to unexistant globals was developed.

Soon we will have all these public on Pharo Jenkins server.

Next Steps

Jenkins jobs :)

Taking jenkins feedback to speed up the cleaning process

Remodularizaton to get even a smaller kernel

Maybe some little experiment to bootstrap MicroSqueak and learn from it

Bootstrap from source code.

I’ll try to keep you updated often. BTW, any ideas, critics or suggestions are welcome. I’m not a gurú, I’m just learning, as everyone :).