Just another WordPress.com weblog

You can stop reading if you using Linux or any other flavor of unix with decent package management.

However, if you using MacOS , or Windows.. you may continue reading. 🙂

Problem: suppose you want to add a plugin to VM, which using some well-known open-source library.
On linux, this is piece of cake.. you just install that library using package management tool , like:

apt-get install somelibrary

and tell plugin to look the installed library at well known place in system. Done.

On Mac and Windows, things are different: you are on you own.. There is no well-known place in system (unless it is of course Apple/Microsoft product), defined where all 3rd-party/shared libraries installed. And often, there’s even no binary distribution for such library, and even if it exists,
if you install it, you never know where it is located, because again there is no standard place.
So, the only solution is to bundle these libs with own app. In most cases you have to download sources from project site and build it by own.

So, the first thing which rang bells in my head is the way how we deal with it today.
For instance, Freetype: If you using Pharo , you know that it provides a freetype support by default,
as well as VMs which we build on Jenkins CI server including corresponding freetype plugin.

From user’s perspective, everything is nice: he can use nice fonts in images.. but if we open a cover of our shiny car, things are not so nice. And i will explain why:

On windows and macs, we were forced to include a whole freetype library as a static component of plugin.. i.e. it just statically linked to FT2Plugin.
What is wrong with such approach? Suppose i want to add another plugin, which using very same library. Now, if i do the same, it will result in having two separate (autonomous) instances of same library used by two different plugins..

And this is exactly what happening with Cairo (which i use in Athens via FFI) because cairo also using freetype (what a coincidence!).
So, in Pharo, we already have freetype. Now Cairo also using freetype. But since cairo is loaded using FFI, there’s no way how it can communicate with statically linked freetype library resided inside FT2Plugin. As result, when Athens using Cairo you have two instances of same library loaded..
But even worse: i cannot reuse existing code from Pharo freetype support code and cope it with cairo. Because if i load a font face using freetype plugin, i cannot pass that face to cairo, because the face object instantiated by one library, and cairo using another one, which know absolutely nothing about each other.

So, obviously an appropriate solution would be to force both FT2Plugin and Cairo library to use same shared, dynamically loadable, library and abandon idea of statically linking a library to single plugin.
Also, since we cannot expect that it will be installed on system somewhere, we should bundle it with our application (which in our case a VM).

Btw, a freetype is not the sole example of statically embedding 3rd-party libraries into out beloved VM. We have couple more under cover:
– FloatMathPlugin includes whole fdlibm
– JPEGReadWriter2Plugin includes a copy of JPEG library
– RePlugin includes PCRE library (perl-compatible regular expressions)

and i think there’s more. And since those libraries are private to those plugins, if some another plugin want to use same functionality, there is no way how they can share same library -> so, you got code duplication and memory waste.

So, i decided to extend the CMakeVMMaker to have a notion of 3rd-party library, and our automated build system can:
– download library from official site
– configure & build it
– bundle it alongside with VM

Here the brief description of a new interface.

First, i introduced a class, named CMThirdpartyLibrary to describe the configuration of thirdparty library as well as a steps to build it and bundle with VM.

This is an abstract class, which shapes a most common attributes of 3rd-party package as well as a more or less standard way to build & include it to VM. A subclass of it can describe the concrete library, as well as customize the way it is built.

This is a name of a library, used to identify it. For obvious reason, there should be no two classes with same canonical name.
To include a library into cmake loop, use #addThirdpartyLibrary: <libname> message, sent to configuration.

Setting variables

The #setVariables method serves mainly to set-up a most common variables in cmake, like paths, names , options and so on, so other parts of generated cmake script can reuse them without duplicating definitions over and over again.

Downloading and unpacking sources.

(the #unpack method is provided by CMThirdpartyLibrary class, which using tar to unpack the downloaded file. Of course if library sources using different archive format (like zip), you will need to override that method to emit an appropriate shell commands to unpack it.

By default, all thirdparty libraries using a ‘thirdparty’ subdirectory in ‘build’ directory:

Since most of the libraries usually having own unique idiosyncrasic way how they are built, this method is one which you will need to change when adopting new library.
Here, as you can see i use custom prefix ${installPrefix}, passed to configure command. So `make install` will copy the built artifacts into place, defined by me, not to some ‘default’ location of your file system. You may ask, why i doing `make install` step, while i can just copy artifacts produced by `make` command. It is because i need to deal with dependencies (see below), and because it is easier to pick all artifacts from a single place, in contrast to searching among numerous and project-specific subdirectories to figure out what exactly you need. Also, if project developers may decide to change the directory structure of their sources, `make install` step makes your configuration agnostic to these changes.

Copying artefacts

The last stage is quite simple. We just copying the dynamic library to our VM bundle:

Dependencies

In case of Cairo, it has dependencies from 3 other libraries (actually 5, but others are ‘standard’ on Mac – libz, libbz2)

– pixman

– libpng

– freetype

To define dependencies, just implement #dependencies method in library class:

dependencies
^ #( ‘pkg-config’ ‘pixman’ ‘libpng’ ‘freetype2’)

All those names is another third-party libs, which should be defined in corresponding classes. CMake config deals with order of dependencies, (so the dependent target are built after one which it depending on). This is where `make install` is very useful: cairo configure script locates the required libraries in my directory (thirdparty/out/…), so i can be sure that it built using libraries which i built, not libraries which may or may not be installed in your system, like using `port` or `homebrew` etc)

The last thing about bundling (and it is Mac-specific) is to change the references in produced .dylib files to correct ones. I tried to use BundleUtils provided by cmake.. but it is prone to be buggy, so i had to write own. All it does is replacing an absolute paths in all .dylib-s (including external plugin libs) to path relative to .app bundle.

P.S. I spent a whole week doing this.. Ohh.. it is really pain in the ass dealing with all those details. But i tried to make an interface which can be reused and it should be simpler to include another library in future. Because without this, i could stop just in a first day.. i had a shell script which just builds cairo.. that all i was needed 🙂

The Athens is a secret military project, with aim to conquer the world..

At first stage our task is to create a modern object-oriented vector graphics framework (codename: Athens) for smalltalk, and then.. yes! conquer the world.

Today, Pharo lacks a decent and modern graphics framework. A heritage, what comes from Squeak (and it goes back to smalltalk’80) is a today’s classic bitblt framework, which allows one to manipulate with bits and blit them in twisted ways. Squeak VM, in addition, having a balloon engine on top of it, which is kind-of vector-based add-on to bitblt engine, but it is also quite outdated and lacks many features comparing to any modern vector graphics frameworks.

Needless to say, that there are big demand for such a framework in our community, because it opens doors to magical world of graphics. I wouldn’t say that it is not open at all, i would say that it doesn’t widely open.. by now, only smartest can sneak through the narrow passage but any inexperienced explorer will be blocked, not saying about larger parties of fun-hunters.

Over past years, there was a multiple attempts (including me) to get a bit closer to the holy grail, but without focus and big investments of time, there is a little chance for something decent to appear. So, one of my tasks, as a Pharo engineer (a job position, i currently have at INRIA, Lille), is to design and implement such framework.

So, little words about what was existed before (or at least i know of):

Rome is a framework which supposed to have multiple backends, but primarily Cairo graphics library , via plugin which provides a binding to it. Currently dormant.

OpenVG in same way, is a binding, which i wrote for an emerging open standard – OpenVG. Dormant as whell 🙂

Apart of it stays an OpenGL framework, but i am not listing it there, because it is more 3D graphics oriented rather than 2D vector graphics. And while you can use OpenGL as backend for rendering 2D vector graphics, it will still require a lot of work , to build something on top of it.

You may rightfully ask , why do we need yet another project which providing merely the same, when we already having two? Isn’t it a waste, instead of putting more effort into one of those, to do everything from scratch?

Of course, Athens is about same thing (vector graphics), but quite different in one significant aspect: it is object-oriented.

The main problem about the above two projects that they are constructed using bottom-up design, serving primarily to expose features in corresponding APIs to smalltalk. They are almost 1:1 replicating the structure of API’s they rely on, with a little thought, how to make a smalltalk side to be more object-oriented instead of being thin wrappers around those libraries.

In other words, give me 1 week, a lump sum of money, and after 2 weeks 😉 i will deliver you a binding to any C library you want. But do not expect that it will be much easier to use it, only because you can access it in smalltalk. This is because exposing a non-object oriented library in smalltalk does not automagically makes it by a bit more object oriented, especially in terms of smalltalk, not in terms of C++ 🙂 .

So, my primary focus in Athens was (well, and still is ) to use top-down oriented design and think more about how to make sure that API is convenient and easy to use, rather than how it would be easy to expose certain features in existing non-object-oriented APIs.

Of course, i have to keep an eye how to make sure that things will be running efficiently, using different backends, but even more i care about framework API and it ease of use.

A top-level API in Athens is self-sufficient. My aim is to make sure that developer don’t needs to know intimate details about how Cairo/OpenVG/OpenGL (or any other backend) works nor needs to study their API in order to start using Athens. Instead all he will need to know is Athens API.

We want to make one thing real: an application, which uses Athens , should work same, no matter what rendering backend you using.

Yes, i know, that such approach leads to a problem of minimum common denominator , but i think that clean design outweighs it , because nobody expects from me to deliver full-blown and feature rich framework from a first iteration, which is then will be “carved in stone” for decades. I think that good foundation will pay off.

Ok. I think it is enough blabbering for today.. i will continue my novel later, in future posts. 🙂

P.S. oh.. btw.. to all ‘take a look’ and ‘have a look’-ers. I taking a look and having a look over many things and i am very interested in what exists on a landscape of modern graphics world. Just take note: this post is about ‘take a look on Athens’, not about ‘lets speculate about the topic’.

One of the major smalltalk shortages, which often mentioned by people, is lack of good interoperability with operating system and external libraries.

I know that originally, smalltalk were designed as an fully functional environment for personal computers, with very little (if any) need for low-level support of what we calling today as “Operating System” or basically “environment”. A modern examples, like SqueakNOS demonstrating very clearly, that smalltalk image can run without any OS, and still be quite functional and even useful.

But here, i do not want to analyze why or what prevented to push smalltalk environment into masses (if it it would be successful, there will be no need to talk about any interoperability with non-smalltalk world at first place), what i want to say is that it is a mistake today to keep trying to stick with paradigm where things happen only inside smalltalk environment, and nothing exists beyond it.

If smalltalk environment runs inside another environment (such as an OS), it is a mistake to not have decent interoperability with it.

So, what we have today? Both, Squeak and Cog VMs, is more or less same in this regard now (sure Cog is more advanced ;).

We have an FFI. Okay, maybe it is not as good or complete as analogous implementations in other similar languages, but in my opinion it is quite decent and we’re not standing here. So, on this front, i think, we are doing fine.

But FFI is just one side of medal: it allows an application(s) written in smalltalk to speak with external libraries, effectively supporting model where smalltalk environment is a host , while external modules is servants (i.e. embedded).

But if we turn our medal upside down, we can see that we’re absolutely missing (literally nothing), to support running smalltalk as embedded language in host application. And i see very little movement (and attention) towards changing the situation in this regard.

Personally, i think that smalltalk as embedded language could be best what i would dream of.

So, if we would like to do it, where we should start from?

First is VM , of course. We should turn VM from being self-sustaining and all-knowing OS-level process into a library, so then a host applications may link either statically or dynamically with it.

Second, we need an API, allowing host application to communicate with VM and be able to control execution of smalltalk code, as well as control the object memory and available VM capabilities.

There are many things, which hardcoded in VM, assuming that it is running as a standalone process in OS. For example a FilePlugin, a plugin which operating with files, but despite it has “plugin” name, is not optional. In cases when a host application using smalltalk as embedded language, it is easy to imagine that in many situations a smalltalk part may not require direct manipulation with files at all, since host application can take care about providing data to smalltalk side by using own, often more efficient, ways.

So we should make sure that VM does not assumes that it runs as a full-fledged environment, because when embedded, it is totally not VM’s cocern, but up to host application.

All of the listed above is relatively easy to fix, because it more or less about the same: we should make VM code more modular, and ensure that VM can be used as a library with well-defined API (like Lua does).

There’s only one thing, which makes me worry and solution might be quite tricky: scheduling. Again, since smalltalk envisioned as self-sustaining environment, it has to deal (by itself) with multiple processes to be able to run many things in parallel, and therefore VM has to support scheduling.

But again, this is not absolutely necessary when smalltalk used as embedded language. I shown that scheduling can be moved completely into language side, so VM will know very little about things like Process, Semaphore. It will still know about things like signals and contexts.. but much less. So this step, i think, is necessary, if we would like to get to the point where we can use smalltalk as embedded language.

For example, imagine that host application sent a message (through VM API, of course) and waiting for answer.. but because of scheduling, the context which evaluating given message can be interrupted and switched to another one and then even killed/lost etc , resulting the situation, where host application could never gain control back.

With VM scheduling it makes very difficult to have a simple call scheme, where host application “calls” VM, then after running some piece of smalltalk code VM returns result(s) and completely stops any activities upon next call. And even interpreter is implemented as infinite loop, once entered, never leaved.

Another aspect of same problem is that we don’t have a good abstraction around VM and interpreter state. If i would want to use multiple different interpreters with different object memory , running in parallel , i cannot do that. Because current VMs are too centered around idea that they controlling everything, and nothing happens outside. I demonstrated with HydraVM, that it is relatively easy to make VM to be able to maintain multiple interpreter states and run two (or more) object memories. But the idea needs further development and more attention.

I am not mentioning a language-side changes, because obviously, when we speaking about embedding, it will mean that in most cases we will run quite specialized image(s), far , far less feature-blown comparing to images what we use today. Yes we need decent tools support for generating such small images by either bootstrapping them or shrinking existing images, but for me VM is more important and more harder part of the story.

I’d like to know, what other thinking about it. Do you think we need to be able to use smalltalk as embedded language at all? Or we can live with just FFI?

Since i had to install everything from scratch on new Win7 running under Virtual Box, and prepare the environment for building Cog VMs, i had to iterate over all steps.

I hate setting up things from scratch, because there’s always something which can go wrong, and you mostly losing time, trying to make things working. And those things are not directly related to your current task (debugging VM), it just a tools, which need to be there before you can even build VM.

I really happy that Mariano wrote down all instructions about preparing environment, so i can just go and do things step by step.

Now it doesn’t means that things which were working year ago will keep working. In my case i stumbled upon strange compiler errors , when i first tried to compile VM:

c:/MinGW/msys/1.0/home/sig/cog/blessed/platforms/win32/vm/sqWin32Intel.c:
In function 'squeakExceptionHandler':
c:/MinGW/msys/1.0/home/sig/cog/blessed/platforms/win32/vm/sqWin32Intel.c:128:18:
error: '_RC_NEAR' undeclared (first use in this function)
c:/MinGW/msys/1.0/home/sig/cog/blessed/platforms/win32/vm/sqWin32Intel.c:128:18:
note: each undeclared identifier is reported only once for each
function it appears in
c:/MinGW/msys/1.0/home/sig/cog/blessed/platforms/win32/vm/sqWin32Intel.c:128:18:
error: '_PC_53' undeclared (first use in this function)
c:/MinGW/msys/1.0/home/sig/cog/blessed/platforms/win32/vm/sqWin32Intel.c:128:18:
error: '_EM_INVALID' undeclared (first use in this function)

Searching web, i found that this is discrepancy introduced with new GCC compiler.

All you need to do is to add:

#ifndef _MINGW_FLOAT_H_
#include_next <float.h>
#endif

in C:\MinGW\lib\gcc\mingw32\4.6.1\include\float.h

(yeah, literally copy and paste at the end of that file)

Now i were able to build VM. But since i built it without freetype plugin first, i did not yet met another problem:

a freetype makefiles using autoconf, and before you will try to build a library, you need to install it. Now of course, you should know how to install things in mingw environment. And since i’m not using it daily, its easy to forget how to do it.

So finally, here the single line which was a reason provoked me to write this post:

As you can see, this primitive takes two arguments. Here the language-side method which using it:

primitiveTransferToProcess: newProcess action: anAction
“Primitive.
Sets an activeProcess to new process,
sets an interruptedProcess to the process which was active
set a scheduler’s action ivar to anAction object
”
<primitive: ‘primitiveTransferToProcess’ module: ”>
self primitiveFailed

The primitive will fail, if image (by some occasion) calls this primitive while having an old scheduler installed.

A new scheduler should have additional instance variables:

interruptProcess

interruptedProcess

action

You may ask, why i made the action as additional argument while in order to switch the active process we need only single parameter – the process which should be activated. The answer is, that sometimes we need to pass an argument to newly switched process, and second is to ensure atomicity when passing this argument. This functionality is used by scheduler to ensure atomicity of different operations.

Otherwise, we are risking, when passing an action argument to scheduler, because the process which setting it, can be preempted before entering interrupt process, by other process, and action will be overridden and therefore lost:

Processor setAction: [ do something here ].

“we can be preempted here”

Processor switchToInterruptProcess.

Usually, to evaluate the piece of code inside an interrupt process, most of the code is using an #interruptWith: method which implemented as following:

An interrupt process is a special process, held by sheduler and runs an infinite loop, which

– performs an action (if any), passed to interrupt process

– handles external signals

– performs scheduling

at the end of each cycle, it switching back to interruptedProcess (using the above primitive). So, if there in no-one changed the interruptedProcess , it will continue running the interrupted process.

A second primitive is used to fetch pending signals from VM’s semaphoresToSignalA/semaphoresToSignalB buffers directly to language side, so then it could handle signals by itself without the need from VM to even know about existance of such objects as semaphores.

primitiveFetchPendingSignals
“primitive, fill an array (first argument)with pending signals, stored insemaphoresToSignalA/semaphoresToSignalB buffers.
Returns a number of signals being filled or negative number indicating that array is not big enough to fetch all signals at once.
Primitive fails if first argument is not array.
“

This primitive is used by interruptProcess to fetch all pending signals from VM and to handle them.

This is mainly all what we need from VM in order to be able to implement own scheduling semantics at language side, without being dependent from VM too much.

Signals unification

The interaction between language side and VM with new scheduler don’t requires from VM to have any knowledge about semaphores. VM operates with only integer values (signals), and scheduler is free to interpret them as it likes to.

The language side passing integer objects to different VM primitives, and VM emits signals using these values. A scheduler then is using a new primitive (primitiveFetchPendingSignals) to fetch the list of pending signals in its interrupt process.

Now, since we lifted the responsibility of interpreting signals from VM side to language side, we could use not only Semaphore to handle the signal, but also any other Object. Scheduler is taking a signal integer value, as an index in its signalHandlers array, then simply sends #handleExternalSignal to an object stored in this array at given index. By default, Object>>handleExternalSignal does nothing, while Semaphore>handleExternalSignal is either awaking the process which is waiting on semaphore or increments an excessSignals value, if there is no processes waiting for it.

This effectively allows us to move scheduling semantics from VM to language side, and free to change it in the future without the need of any changes in VM.

The current Squeak VM provides an API function for plugins (signalSemaphoreWithIndex: ) which could be used by plugins to signal a semaphore stored in external objects table (Smalltalk externalObjects == Smalltalk specialObjectsArray at: 39). This means, that all plugins already operating with integer values (signals) and they can’t really interact with semaphore objects, so we don’t need any changes here, except that instead of directly signaling a semaphore object in external semaphores table by VM, we let a new scheduler to interpret this signal at language side.

In Interpreter, however, there are some places which operating with semaphores directly. So, all what we need now is to review all such places in order to unify this.

The primitiveLowSpaceSemaphore is now obsolete, since VM don’t needs to signal semaphores directly – it just passing this signal in primitiveFetchPendingSignals to let scheduler handle it.

An addPendingSignal: is a service method, which stores a new signal to semaphoresToSignalA/semaphoresToSignalB buffers, in same manner as signalSemaphoreWithIndex: does, except that its not sends forceInterruptCheck. As you may guess, i changed signalSemaphoreWithIndex: method to a two-liner:

signalSemaphoreWithIndex: index
“Record the given semaphore index in the double buffer semaphores array to be signaled at the next convenient moment. Force a real interrupt check as soon as possible.”

to register an object, who will take care for handling this event. It can be semaphore or something else – we don’t really care at this moment. A #performDuringInterrupt: used to ensure that given enclosed code will run atomically and can’t be preempted.

Next, a current primitiveSignalAtMilliseconds takes a two arguments: the millisecond clock value, when timer event should be signaled and a semaphore to store at TheTimerSemaphore index in special objects array.

Since, now we passing a signal value (TimerSignal constant) to language side to handle it, the need in having an object to be signaled is redundant, that’s why i made a new primitive for this:

A new Delay implementation will make use this new primitive instead of old one.

Again, we add a convenience method setTimerSignalHandler: anObject to new scheduler. In contrast to previous implementation, we don’t need to change the handler object during run time. We set it only once at image startup phase – it will be a Delay class, which on receiving this signal will perform a regular procedure for signaling expired delays, if any. But i don’t want go in detail about this now, since this post dediceted to describe the VM-side changes.

fortunately, there is no special primitive for setting the finalization semaphore, so this is all what we need to change in VM.

Backward compatibility

The changes is made in such way, that VM will stay compatible with old image, which don’t use new scheduler, but if it does, then its using different code. You can browse all senders of hasNewScheduler message to see where code takes different route in a presence of new scheduler.

This method simply checks , that a Processor special object having additional slots:

I don’t like to create many identities on different sites (because the more you have, the harder to remember all the details, like credentials, emails etc, and even a site name), but its just happens, that in order to improve communications with Squeak community, Squeak Board decided to create a blog, and i, as a board member, have to register here to participate.

I found that wordpress blogging engine is much more modern & convenient than hotspot, so i decided to move my personal blog here.