Posts tagged with 'snippet'

Last night I did a trivial yet surprisingly satisfying hardware hack, of the kind that can only be accomplished when the brain is in holiday mode. Our son has that very simple airplane toy, which turned out to become one of his most loved ones, enough to have deserved wheel repairs before. He’s also reportedly a fan of all kinds of light-emitting or reflecting objects (including the sun, and specially the moon). So the idea sparkled: how easily can that airplane get a blinking led?

With an attiny85, a CR2032 battery, a LED, and half an hour of soldering work, this was the result:

The code loaded in the chip is small enough to be listed here, and it gets away with blinking without waking up the main CPU clock:

The power consumption in the idle mode plus the blinks should keep the coin battery running for a couple of weeks, at least. A vibration sensor would improve that significantly, by enabling the MCU to go into powerdown mode and be awaken externally, but I don’t have a sensor at hand that is small enough.

There were a few long standing issues in the yaml.v1 package which were being postponed so that they could be done at once in a single incompatible change, and the time has come: yaml.v2 is now available.

Besides these incompatible changes, other compatible fixes and improvements were performed in that push, and those were also applied to the existing yaml.v1 package so that dependent applications benefit immediately and without modifications.

The subtopics below outline exactly what changed, and how to adapt existent code when necessary.

Type errors

With yaml.v1, decoding a YAML value that is not compatible with the Go value being unmarshaled into will silently drop the offending value without notice. In many cases continuing with degraded behavior by ignoring the problem is intended, but this was the one and only option.

In yaml.v2, these problems will cause a *yaml.TypeError to be returned, containing helpful information about what happened. For example:

yaml: unmarshal errors:
line 3: cannot unmarshal !!str `foo` into int

On such errors the decoding process still continues until the end of the YAML document, so ignoring the TypeError will produce logic equivalent to the old yaml.v1 behavior.

New Marshaler and Unmarshaler interfaces

The way that yaml.v1 allowed custom types to implement marshaling and unmarshaling of YAML content was slightly confusing and troublesome. For example, considering a CustomType with a keys field:

type CustomType struct {
keys map[string]int
}

and supposing the goal is to unmarshal this YAML map into it:

custom:
a: 1
b: 2
c: 3

With yaml.v1, one would need to implement logic similar to the following for that:

This is too much trouble when the package can easily do those conversions internally already. To fix that, in yaml.v2 the Getter and Setter interfaces are both gone and were replaced by the Marshaler and Unmarshaler interfaces.

Using the new mechanism, the example above would be implemented as follows:

Custom-ordered maps

By default both yaml.v1 and yaml.v2 will marshal keys in a stable order which is increasing within the same type and arbitrarily defined across types. So marshaling is already performed in a sensible order, but it cannot be customized in yaml.v1, and there’s also no way to tell which order the map was originally in, as some applications require.

To fix that, there is a new pair of types that support preserving the order of map keys both when marshaling and unmarshaling: MapSlice and MapItem.

Such an ordered map literal would look like:

m := yaml.MapSlice{{"c", 3}, {"b", 2}, {"a", 1}}

The MapSlice type may be used for variables going in and out of the yaml package, or in struct fields, map values, or anywhere else sensible.

Binary values

Strings in YAML must be valid UTF-8 or UTF-16 (with a byte order mark for the latter), and for binary data the specification defines a standard !!binary tag which represents the raw data encoded (encrypted?) as base64. This is now supported both in yaml.v1 and yaml.v2, transparently. That is, any string value that is not valid UTF-8 will be base64-encoded and appropriately tagged so that it roundtrips as the same string. Short strings are inlined, while long ones are automatically broken into several lines and represented in a proper style. For example:

As detailed in the preliminary release of qml.v1 for Go a couple of weeks ago, my next task was to finish the improvements in its OpenGL API. Good progress has happened since then, and the new API is mostly done and available for experimentation. At the same time, there’s still work to do on polishing edges and on documenting the extensive API. This blog post aims to present the improvements made, their internal design, and also to invite help for finishing the pending details.

Before diving into the new, let’s first have a quick look at how a Go application using OpenGL might look like with qml.v0. This is an excerpt from the old painting example:

The example imports both the qml and the gl packages, and then defines a Paint method that makes use of the GL types, functions, and constants from the gl package. It looks quite reasonable, but there are a few relevant shortcomings.

One major issue in the current API is that it offers no means to tell even at a basic level what version of the OpenGL API is being coded against, because the available functions and constants are the complete set extracted from the gl.h header. For example, OpenGL 2.0 has GL_ALPHA and GL_ALPHA4/8/12/16 constants, but OpenGL ES 2.0 has only GL_ALPHA. This simplistic choice was a good start, but comes with a number of undesired side effects:

Many trivial errors that should be compile errors fail at runtime instead

When the code does work, the developer is not sure about which API version it is targeting

Symbols for unsupported API versions may not be available for linking, even if unused

That last point also provides a hint of another general issue: portability. Every system has particularities for how to load the proper OpenGL API entry points. For example, which libraries should be linked with, where they are in the local system, which entry points they support, etc.

So this is the stage for the improvements that are happening. Before detailing the solution, let’s have a look at the new painting example in qml.v1, that makes use of the improved API:

With the new API, rather than importing a generic gl package, a version-specific gl/2.0 package is imported under the name GL. That choice of package name allows preserving familiar OpenGL terms for both the functions and the constants (gl.Enable and GL.BLEND, for example). Inside the new Paint method, the gl value obtained from GL.API holds only the functions that are defined for the specific OpenGL API version imported, and the constants in the GL package are also constrained to those available in the given version. Any improper references become build time errors.

To support all the various OpenGL versions and profiles, there are 23 independent packages right now. These packages are of course not being hand-built. Instead, they are generated all at once by a tool that gathers information from various sources. The process can be tersely described as:

Version-specific functions might also be extracted from the Khronos Registry, but there’s a good reason to use information from the Qt headers instead: Qt already solved the portability issue. It works in several platforms, and if somebody is using QML successfully, it means Qt is already using that system’s OpenGL capabilities. So rather than designing a new mechanism to solve the same problem, the qml package now leverages Qt for resolving all the GL function entry points and the linking against available libraries.

Going back to the example, it also demonstrates another improvement that comes with the new API: plain types that do not carry further meaning such as gl.Float and gl.Int were replaced by their native counterparts, float32 and int32. Richer types such as Enum were preserved, and as suggested by David Crawshaw some new types were also introduced to represent entities such as programs, shaders, and buffers. The custom types are all available under the common gl/glbase package that all version-specific packages make use of.

So this is all working and available for experimentation right now. What is left to do is almost exclusively improving the list of function tweaks with two goals in mind, which will be highlighted below as those are areas where help would be appreciated, mainly due to the footprint of the API.

Documentation importing

There are a few hundred functions to document, but a large number of these are variations of the same function. The previous approach was to simply link to the upstream documentation, but it would be much better to have polished documentation attached to the functions themselves. This is the new documentation for MultMatrixd, for example. For now the documentation is being imported manually, but the final process will likely consist of some automation and some manual polishing.

Function polishing

The standard C OpenGL API can often be translated automatically (see BindBuffer or BlendColor), but in other cases the function prototype has to be tweaked to become friendly to Go. The translation tool already has good support for defining most of these tweaks independently from the rest of the machinery. For example, the following logic changes the ShaderSource function from its standard from into something convenient in Go:

and as an even simpler example, CreateProgram is tweaked so that it returns a glbase.Program instead of the default uint32.

name: "CreateProgram",
result: "glbase.Program",

That kind of polishing is where contributions would be most appreciated right now. One valid way of doing this is picking a range of functions and importing and polishing their documentation manually, and while doing that keeping an eye on required tweaks that should be performed on the function based on its documentation and prototype.

If you’d like to help somehow, or just ask questions or report your experience with the new API, please join us in the project mailing list.

As part of the on going work on Ubuntu Touch phones, I was invited to contribute a Go package to interface with ubuntuoneauth, a C++ and Qt library that authenticates against Ubuntu One using the system account made available by the phone owner. The details of that library and its use case are not interesting for most people right now, but the work to interface with it is a good example to talk about because, besides the result (uoneauth) being an external and independent Go package that extends the qml package, ubuntuoneauth is not a QML library, but rather a plain Qt library. Some of the callbacks use even traditional C++ types that do not inherit from QObject and have no Qt metadata, so offering that functionality from Go nicely takes a bit more work.

What follows are some of the highlights of that integration logic, to serve as a reference for similar extensions in the future. Note that if your interest is in creating QML applications with Go, none of this is necessary and the documentation is a better place to start.

As an initial step, the following examples demonstrate how the original C++ library and the Go package being designed are meant to be used. The first snippet contains the relevant logic taken out of the examples/signing-main.cpp file, tweaked for clarity:

The example hooks into various signals in the service, one for each possible outcome, and then calls the service’s getCredentials method to initiate the process. If successful, the credentialsFound signal is emitted with a Token value that is able to sign URLs, returning an HTTP header that can authenticate a request.

That same process is more straightforward when using the library from Go:

NewService creates the service instance, and then asks the qml package to run some logic in the main Qt thread via RunMain. This is necessary because a number of operations in Qt, including the creation of objects, are associated with the currently running thread. Using RunMain in this case ensures that the creation of the C++ object performed by newSSOService happens in the main Qt thread (the “GUI thread”).

Then, the address of the C++ UbuntuOne::SSOService type is handed to CommonOf to obtain a Common value that implements all the common logic supported by C++ types that inherit from QObject. This is an unsafe operation as there’s no way for CommonOf to guarantee that the provided address indeed points to a C++ value with a type that inherits from QObject, so the call site must necessarily import the unsafe package to provide the unsafe.Pointer parameter. That’s not a problem in this context, though, since such extension packages are necessarily dealing with unsafe logic in either case.

The obtained Common value is then assigned to the service’s obj field. In most cases, that value is instead assigned to an anonymous Common field, as done in qml.Window for example. Doing so means qml.Window values implement the qml.Object interface, and may be manipulated as a generic object. For the new Service type, though, the fact that this is a generic object won’t be disclosed for the moment, and instead a simpler API will be offered.

Following the function logic further, a finalizer is then registered to ensure the C++ value gets deallocated if the developer forgets to Close the service explicitly. When doing that, it’s important to ensure the Close method drops the finalizer when called, not only to facilitate the garbage collection of the object, but also to avoid deallocating the same value twice.

The next four lines in the function should be straightforward: they register methods of the service to be called when the relevant signals are emitted. Here is the implementation of two of these methods:

Handling the signals consists of just sending the reply over the channel to whoever initiated the request. The select statement in sendReply just ensures that the invariant of having a reply per request is not broken without being noticed, as that would require a slightly different design.

There’s one more point worth observing in this logic: the token value received as a parameter in credentialsFound was already converted into the local Token type. In most cases, this is unnecessary as the parameter is directly useful as a qml.Object or as another native type (int, etc), but in this case UbuntuOne::Token is a plain C++ type that does not inherit from QObject, so the default signal parameter that would arrive in the Go method has only a type name and the value address.

Instead of taking the plain value, it is turned into a more useful one by registering a converter with the qml package:

The lock ensures that a second request won’t take place before the first one is done, forcing the correct sequencing of replies. Given the current logic, this isn’t strictly necessary since all requests are equivalent, but this will remain correct even if other methods from SSOService are added to this interface.

The returned Token value may then be used to sign URLs by simply calling the respective underlying method:

No care about using qml.RunMain has to be taken in this case, because UbuntuOne::Token is a simple C++ type that does not interact with the Qt machinery.

This completes the journey of creating a package that provides access to the ubuntuoneauth library from Go code. In many cases it’s a better idea to simply rewrite the logic in Go, but there will be situations similar to this library, where either rewriting would take more time than reasonable, or perhaps delegating the maintenance and stabilization of the underlying logic to a different team is the best thing to do. In those cases, an approach such as the one exposed here can easily solve the problem.

As recently announced, my latest endeavor at Canonical is to enable graphical client-side development with the Go language via a new qml Go package that integrates the language with Qt’s QML framework.

The QML framework solves the problem of designing graphic applications via a language that offers a convenient mix of declarative and procedural features. As a very simple example, if the following QML content is loaded by itself, it will draw a square centralized inside a window:

The red rectangle will remain centered as the window is resized, and will print a message every time it is clicked. The clicked signal was hooked into logic implemented in JavaScript in this case, but it might as well have been hooked into custom Go logic with the same ease using the qml package. With very minor changes, it could also be something much more interesting than a red rectangle, such as a modern web browser:

It’s worth noting that this wasn’t just a screenshot of a web page, but rather a tiny fully functional web browser accessing a web page, easily embedded into the QML application to satisfy whatever browsey needs the author might have.

Being able to leverage all these existing building blocks so comfortably is what makes a graphics platform attractive to develop in, and is what inspired the on going effort to have that platform working under the Go language. As we can observe on some of the work published, that side of things seems to be going well.

Now, the next level up is to enable people to create such building blocks without leaving the Go language. The initial step towards that is already committed to the code repository. It enables Go types to be created and seamlessly integrated into the QML language. For example, consider this simple Go type:

There’s a relevant detail, though: this type has no visible content by itself. This means that displaying something must be done via interactions with other QML items, or via external systems (dbus, for example).

Solving that problem has been one of my objectives in the last month, and although it’s not yet publicly available, the work is in a good-enough state that I feel comfortable talking about it.

As described, the main goal is enabling these custom types to paint. This will be achieved by exposing a Go package that offers an OpenGL API, and defining methods that the custom types have to implement for being able to render at appropriate times. Although the details aren’t finalized, the current draft can run familiar OpenGL code similar to the following:

One interesting point to realize is that, again, this isn’t just rendering into a lonely OpenGL context. The rendered content goes into a framebuffer that lives within the overall QML scene graph, and has the same integration capabilities as every other QML item.

In the following animated image, for example, the white square was generated with the above GL code, and was rendered next to other native items with this QML source:

Note the smooth blending and proper overlapping established in the scene graph based on the ordering of elements in the QML source. The animation was also driven by QML without special handling of the content rendered from Go.

Another relevant point in terms of the integration with Go is that the GL code demonstrated above is considered old-school these days. The modern version uses fewer calls, making better use of the graphics card memory by transferring, tracking, and handling data such as vertexes in contiguous arrays. This reduces the impact of the cross-context calls that depart and rejoin the Go runtime, and can unlock interesting use cases that the overhead might otherwise prevent.

In terms of availability of these features, we’re about to enter a holiday season, so they should be polished enough for a first public review at some point in January. Looking forward to it.

UPDATE (2014-01-27): These features are now publicly available. The painting example demonstrates the exact scenario pictured above.

UPDATE (2014-02-18): New screencast demonstrates some of the OpenGL features of Go QML.

About a year ago I ordered a pack of 10 atmega328p processors from China to play with. They took a while to get here, and it took even longer for me to get back to them, but a few days ago the motivation to start doing something finally appeared.

I’ve never actually played with AVRs before, and felt a bit like I was jumping a step in my electronics enthusiast progress by not diving into its architecture a bit more deeply. Also, despite the obvious advantages of ARM-based chips these days, the platform is still interesting in some perspectives, such as its widespread availability, low price in small quantities, and the ability to plug them in a breadboard and do things without pretty much any circuitry.

To get acquainted with the architecture and to depart from things I work on more frequently, the project is so far taking the shape of an assembly library of functionality relevant for developing small projects, built mainly around binutils for the AVR. I did end up cheating a bit and compiling the assembly code via avr-gcc, just to get the __do_copy_data initialization routine injected, so that I don’t have to pull up the .data section from program memory into RAM manually.

I started running the test programs with the chip itself, with the help of a Pirate Bus, to see if the whole setup was sound. Once it worked a few times, I moved on to use the simulavr simulator to make the process of running and debugging more comfortable. In addition to being able to attach gdb, and trace execution, one of the nice features of simulavr is being able to map a port from the emulated CPU and get bytes written into it sent to an arbitrary file in the outer world. That means we can easily implement a trivial println-like function in assembly:

Printing strings is only helpful if we do have strings, though, and with such a skeleton system there are no interesting ones yet. What we do have are registers, lots of them (32 in total). A good candidate for the next function would then be an itoa-like function that would put the proper bytes in memory for printing.

So, after going down that road for a bit longer, the lack of a proper way to run tests on the created code was an evident show stopper. There’s no way the created code will be sane without being able to exercise it, and write tests that can be rerun at will. Fortunately, it’s easy enough to apply traditional testing practices to such an environment, given the simulator features mentioned.

To drive those tests, a small tool named avrtest was written in Go. It takes an avrtest.list file that looks like this:

As part of one of the projects we’ve been pushing at Canonical, I spent a few days researching about the possibility of extending a compiled Go application with a tiny language that would allow expressing simple procedural logic in a controlled environment. Although we’re not yet sure of the direction we’ll take, the result of this short experiment is being released as the twik language for open fiddling.

The implementation is straightforward, with under 400 lines for the parser and evaluator, and under 350 lines in the default functions provided for the language skeleton: var, func, do, if, and, or, etc.

It also comes with an interactive interpreter to play with. You can install it with:

In an effort to polish the recently released draft of the strepr v1 specification, I’ve spent the last couple of days in a Go reference implementation.

The implemented algorithm is relatively simple, efficient, and consumes a conservative amount of memory. The aspect of it that deserved the most attention is the efficient encoding of a float number when it carries an integer value, as covered before. The provided tests are a useful reference as well.

The API offered by the implemented package is minimal, and matches existing conventions. For example, this simple snippet will generate a hash for the stable representation of the provided value:

In all of those cases the hash obtained is the same, despite the fact that the processed values were typed differently in some occasions. For example, due to its Javascript background, some JSON libraries may unmarshal numbers as binary floating point values, while others distinguish the value based on the formatting used. The strepr algorithm flattens out that distinction so that different platforms can easily agree on a common result.

To visualize (or debug) the stable representation defined by strepr, the reference implementation has a debug dump facility which is also exposed in the command line tool:

Assume uf is an unsigned integer with 64 bits that holds the IEEE-754 representation for a binary floating point number of that size.

The questions are:

1. How to tell if uf represents an integer number?

2. How to serialize the absolute value of such an integer number in the minimum number of bytes possible, using big-endian ordering and the 8th bit as a continuation flag? For example, float64(1<<70 + 3<<21) serializes as:

"\x81\x80\x80\x80\x80\x80\x80\x83\x80\x80\x00"

The background for this problem is that the current draft of the strepr specification mentions that serialization. Some languages, such as Python and Ruby, implement transparent arbitrary precision integers, and that makes implementing the specification easier.

For example, here is a simple Python interactive session that arrives at the result provided above exploring the native integer representation.

Python makes the procedure simpler because it is internally converting the float into an integer of appropriate precision via standard C functions, and then offering bit operations on the resulting value.

The suggested brain teaser can be efficiently solved using just the IEEE-754 representation, though, and it’s relatively easy because the problem is being constrained to the integer space.

This week I found some time to work on another small spin-off from the juju project at Canonical, and I’m happy to make it openly available today: the xmlpath package, which implements an efficient and strict subset of the XPath specification for the Go language.

This new package will be used in an upcoming (and long due) revision of the goamz package API, which is currently limited by the fact that once the XML result returned by Amazon is unmarshalled into a static structure, any other data that the package wasn’t prepared to deal with becomes hard to access by clients. This problem is being solved by parsing the tree into an intermediary form which can then have XPath expressions conveniently and efficiently applied to it.

Path expressions currently supported by the package are in the following format, with all components being optional:

/axis-name::node-test[predicate]/axis-name::node-test[predicate]

Compatibility with the XPath specification goes to the following extent:

Last week I was part of a rant with a couple of coworkers around the fact Go handles errors for expected scenarios by returning an error value instead of using exceptions or a similar mechanism. This is a rather controversial topic because people have grown used to having errors out of their way via exceptions, and Go brings back an improved version of a well known pattern previously adopted by a number of languages — including C — where errors are communicated via return values. This means that errors are in the programmer’s face and have to be dealt with all the time. In addition, the controversy extends towards the fact that, in languages with exceptions, every unadorned error comes with a full traceback of what happened and where, which in some cases is convenient.

All this convenience has a cost, though, which is rather simple to summarize:

Exceptions teach developers to not care about errors.

A sad corollary is that this is relevant even if you are a brilliant developer, as you’ll be affected by the world around you being lenient towards error handling. The problem will show up in the libraries that you import, in the applications that are sitting in your desktop, and in the servers that back your data as well.

Writing correct code in the exception-throwing model is in a sense harder than in an error-code model, since anything can fail, and you have to be ready for it. In an error-code model, it’s obvious when you have to check for errors: When you get an error code. In an exception model, you just have to know that errors can occur anywhere.

In other words, in an error-code model, it is obvious when somebody failed to handle an error: They didn’t check the error code. But in an exception-throwing model, it is not obvious from looking at the code whether somebody handled the error, since the error is not explicit.
(…)
When you’re writing code, do you think about what the consequences of an exception would be if it were raised by each line of code? You have to do this if you intend to write correct code.

That’s exactly right. Every line that may raise an exception holds a hidden “else” branch for the error scenario that is very easy to forget about. Even if it sounds like a pointless repetitive task to be entering that error handling code, the exercise of writing it down forces developers to keep the alternative scenario in mind, and pretty often it doesn’t end up empty.

It isn’t the first time I write about that, and given the controversy that surrounds these claims, I generally try to find one or two examples that bring the issue home. So here is the best example I could find today, within the pty module of Python’s 3.3 standard library:

Every time someone calls this logic with an improper executable in argv there will be a new Python process lying around, uncollected, and unknown to the application, because execlp will fail, and the process just forked will be disregarded. It doesn’t matter if a client of that module catches that exception or not. It’s too late. The local duty wasn’t done. Of course, the bug is trivial to fix by adding a try/except within the spawn function itself. The problem, though, is that this logic looked fine for everybody that ever looked at that function since 1994 when Guido van Rossum first committed it!

That’s a pretty harsh crash for the lack of locale data in a system-level application that is, ironically, supposed to tell users what packages to install when commands are missing. Note that at the top of the stack there’s a reference to crash_guard. This function has the intent of catching all exceptions right at the edge of the call stack, and displaying a detailed system specification and traceback to aid in fixing the problem.

Such “parachute catching” is a fairly common pattern in exception-oriented programming and tends to give developers the false sense of having good error handling within the application. Rather than actually guarding the application, though, it’s just a useful way to crash. The proper thing to have done in the case above would be to print a warning, if at all, and then let the program run as usual. This would have been achieved by simply wrapping that one line as in:

Clearly, it was easy to handle that one. The problem, again, is that it was very natural to not do it in the first place. In fact, it’s more than natural: it actually feels good to not be looking at the error path. It’s less code, more linear, and what’s left is the most desired outcome.

The consequence, unfortunately, is that we’re immersing ourselves in a world of brittle software and pretty whales. Although more verbose, the error result style builds the correct mindset: does that function or method have a possible error outcome? How is it being handled? Is that system-interacting function not returning an error? What is being done with the problem that, of course, can happen?

A surprising number of crashes and plain misbehavior is a result of such unconscious negligence.

Our son Otávio was born recently. Right in the first few days, we decided to keep tight control on the feeding times for a while, as it is an intense routine pretty unlike anything else, and obviously critical for the health of the baby. I imagined that it wouldn’t be hard to find an Android app that would do that in a reasonable way, and indeed there are quite a few. We went with Baby Care, as it has a polished interface and more features than we’ll ever use. The app also includes some basic statistics, but not enough for our needs. Luckily, though, it is able to export the data as a CSV file, and post-processing that file with the R language is easy, and allows extracting some fun facts about what the routine of a healthy baby can look like in the first month, as shown below.

The first thing to do is to import the raw data from the CSV file. It is a one-liner in R:

> info = read.csv("baby-care.csv", header=TRUE)

Then, this file actually comes with other events that won’t be processed now, so we’ll slice it and grab only the rows and columns of interest:

A total of 63 hours surprised me, but the mean time of around 10 minutes per feeding is within the recommendation, and the standard deviation looks reasonable. It may be more conveniently pictured as a histogram:

The mean looks good, with about two hours every day. The standard deviation looks high on a first look, but it’s actually not that bad if we take off the first few days. Looking at the graph shows why: the slope on the left-hand side, which is expected as there’s less milk and the baby has more trouble right after birth.

The chart shows a red flag, though: one day seems well below the mean. This is something to be careful about, as babies can get into a loop where they sleep too much and miss being hungry, the lack of feeding causes hypoglycemia, which causes more sleep, and it doesn’t end up well. A rule of thumb is to wake the baby up every two hours in the first few days, and at most every four hours once he stabilizes for the following weeks.

So, this was another point of interest: what are the intervals between feedings?

Seems great, with most feedings well under two hours. There's a worrying outlier, though, of more than 6 hours. Unsurprisingly, it happened over night:

> feeding$End.Time[interval > 300]
[1] 2013/01/07 00:52

It wasn't a significant issue, but we don't want that happening often while his body isn't yet ready to hold enough energy for a full night of sleep. That's the kind of reason we've been monitoring him, and is important because our bodies are eager to get full nights of sleep, which opens the door for unintended slack. As a reward for that kind of control, we've got the chance to enjoy not only his health, but also an admirable mood.

A long time before I seriously got into using distributed version control systems (DVCS) such as Bazaar and Git for developing software, it was already well known to me how the mechanics of these systems worked, and why people benefited from them. That said, it wasn’t until I indeed started to use DVCS tools that I understood how much my daily workflow around code bases would be changed and improved.

This weekend, while flying home from MongoSV, I could experience that same feeling in relation to first class concurrency support in programming languages. Everybody knows how the feature may be used, but I have the feeling that until one actually experiences it in practice, it’s very hard to really understand how much the relationship with ordering while developing software may be improved.

I was having some fun working on improvements to Goetveld. This package allows Go programs to communicate with Rietveld servers to manipulate code review entries. The Rietveld API is a bit rough in a few places, and as a result some features of the package actually parse an HTML form to extract some data, before sending it back. You may have done something similar before while attempting to script a web site that wasn’t originally intended to be.

The interesting fact here is that this is an intrinsically serial procedure: load a form, change it, and send it back, right? Well, not really. As one might intuitively expect, establishing an SSL session and its underlying TCP connection are not instantaneous operations.

To give an idea, here is part of a dump of an SSL connection being initiated (that is, no HTTP data was sent yet) to codereview.appspot.com, originated from my home location:

That’s more than half a second before the application layer was even touched. So, turns out that to save that roundtrip time, we can start both the form loading and the form sending requests at the same time. By the time the form loading ends, processing the data locally is extremely fast, and we can complete the sending side by just providing the request body.

At this time you may be thinking something like “Ugh, that’s too much trouble.. why bother?”, and that highlights precisely the point I’d like to make: it is too much trouble because most people are used to languages that turn it into too much trouble, but the issue is not inherently complex. In fact, this is the entire implementation of this logic in Go:

I'm not cheating. The procedure was being done serially before, with very similar logic. Previously it had to take the form variable itself from the first request and manually provide it to the next one. Now, instead of providing the form, it's providing a channel that will be used to send the form across. One might even argue that the channel makes the algorithm more natural, curiously.

This is the kind of procedure that becomes fun and natural to write, after having first class concurrency at hand for some time. But, as in the case of DVCS, it takes a while to get used to the idea that concurrency and simplicity are not necessarily at opposing ends.

Certainly one of the reasons why many people are attracted to the Go language is its first-class concurrency aspects. Features like communication channels, lightweight processes (goroutines), and proper scheduling of these are not only native to the language but are integrated in a tasteful manner.

If you stay around listening to community conversations for a few days there’s a good chance you’ll hear someone proudly mentioning the tenet:

Do not communicate by sharing memory; instead, share memory by communicating.

That model is very sensible, and being able to approach problems this way makes a significant difference when designing algorithms, but that’s not exactly news. What I address in this post is an open aspect we have today in Go related to this design: the termination of background activity.

As an example, let’s build a purposefully simplistic goroutine that sends lines across a channel:

The type has a channel where the client can consume lines from, and an internal buffer
used to produce the lines efficiently. Then, we have a function that creates an initialized
reader, fires the reading loop, and returns. Nothing surprising there.

In the loop we'll grab a line from the buffer, close the channel in case of errors and stop, or otherwise send the line to the other side, perhaps blocking while the other side is busy with other activities. Should sound sane and familiar to Go developers.

There are two details related to the termination of this logic, though: first, the error information is being dropped, and then there's no way to interrupt the procedure from outside in a clean way. The error might be easily logged, of course, but what if we wanted to store it in a database, or send it over the wire, or even handle it taking in account its nature? Stopping cleanly is also a valuable feature in many circumstances, like when one is driving the logic from a test runner.

I'm not claiming this is something difficult to do, by any means. What I'm saying is that there isn't today an idiom for handling these aspects in a simple and consistent way. Or maybe there wasn't. The tomb package for Go is an experiment I'm releasing today in an attempt to address this problem.

The model is simple: a Tomb tracks whether the goroutine is alive, dying, or dead, and the death reason.

To understand that model, let's see the concept being applied to the LineReader example. As a first step, creation is tweaked to introduce Tomb support:

Note a few interesting points here: first, Done is called to track the goroutine termination right before the loop function returns. Then, the previously loose error now goes into the Kill Tomb method, flagging the goroutine as dying. Finally, the channel send was tweaked so that it doesn't block in case the goroutine is dying for whatever reason.

A Tomb has both Dying and Dead channels returned by the respective methods, which are closed when the Tomb state changes accordingly. These channels enable explicit blocking until the state changes, and also to selectively unblock select statements in those cases, as done above.

With the loop modified as above, a Stop method can trivially be introduced to request the clean termination of the goroutine synchronously from outside:

In this case the Kill method will put the tomb in a dying state from outside the running goroutine, and Wait will block until the goroutine terminates itself and notifies via the Done method as seen before. This procedure behaves correctly even if the goroutine was already dead or in a dying state due to internal errors, because only the first call to Kill with an actual error is recorded as the cause for the goroutine death. The nil value provided to t.Kill is used as a reason when terminating cleanly without an actual error, and it causes Wait to return nil once the goroutine terminates, flagging a clean stop per common Go idioms.

This is pretty much all that there is to it. When I started developing in Go I wondered if coming up with a good convention for this sort of problem would require more support from the language, such as some kind of goroutine state tracking in a similar way to what Erlang does with its lightweight processes, but it turns out this is mostly a matter of organizing the workflow with existing building blocks.

The tomb package and its Tomb type are a tangible representation of a good convention for goroutine termination, with familiar method names inspired in existing idioms. If you want to make use of it, go get the package with:

Circular buffers are based on an algorithm well known by any developer who’s got past the “Hello world!” days. They offer a number of key characteristics with wide applicability such as constant and efficient memory use, efficient FIFO semantics, etc.

One feature which is not always desired, though, it the fact that circular buffers traditionally will either overwrite the last element, or raise an overflow error, since they are generally implemented as a buffer of constant size. This is an unwanted property when one is attempting to consume items from the buffer and it is not an option to blindly drop items, for instance.

This post presents an efficient (and potentially novel) algorithm for implementing circular buffers which preserves most of the key aspects of the traditional version, while also supporting dynamic expansion when the buffer would otherwise have its oldest entry overwritten. It’s not clear if the described approach is novel or not (most of my novel ideas seem to have been written down 40 years ago), so I’ll publish it below and let you decide.

Traditional circular buffers

Before introducing the variant which can actually expand during use, let’s go through a quick review on traditional circular buffers, so that we can then reuse the nomenclature when extending the concept. All the snippets provided in this post are written in Python, as a better alternative to pseudo-code, but the concepts are naturally portable to any other language.

So, the most basic circular buffer needs the buffer itself, its total capacity, and a position where the next write should occur. The following snippet demonstrates the concept in practice:

In the example above, the first two elements of the series (0 and 1) were overwritten once the pointer wrapped around. That’s the specific feature of circular buffers which the proposal in this post will offer an alternative for.

The snippet below provides a full implementation of the traditional approach, this time including both the pushing and popping logic, and raising an error when an overflow or underflow would occur. Please note that these snippets are not necessarily idiomatic Python. The intention is to highlight the algorithm itself.

With the basics covered, let’s look at how to extend this algorithm to support dynamic expansion in case of overflows.

Dynamically expanding a circular buffer

The approach consists in imagining that the same buffer can contain both a circular buffer area (referred to as the ring area from here on), and an overflow area, and that it is possible to transform a mixed buffer back into a pure circular buffer again. To clarify what this means, some examples are presented below. The full algorithm will be presented afterwards.

First, imagine that we have an empty buffer with a capacity of 5 elements as per the snippet above, and then the following operations take place:

At this point we have a full buffer, and with the original implementation an additional push would raise an assertion error. To implement expansion, the algorithm will be changed so that those items will be appended at the end of the buffer. Following the example, pushing two additional elements would behave the following way:

In that example, elements 7 and 8 are part of the overflow area, and the ring area remains with the same capacity and length of the original buffer. Let’s perform a few additional operations to see how it would behave when items are popped and pushed while the buffer is split:

In this case, even though there are two free slots available in the ring area, the last item pushed was still appended at the overflow area. That’s necessary to preserve the FIFO semantics of the circular buffer, and means that the buffer may expand more than strictly necessary given the space available. In most cases this should be a reasonable trade off, and should stop happening once the circular buffer size stabilizes to reflect the production vs. consumption pressure (if you have a producer which constantly operates faster than a consumer, though, please look at the literature for plenty of advice on the problem).

The remaining interesting step in that sequence of events is the moment when the ring area capacity is expanded to cover the full allocated buffer again, with the previous overflow area being integrated into the ring area. This will happen when the content of the previous partial ring area is fully consumed, as shown below:

At this point, the whole buffer contains just a ring area and the overflow area is again empty, which means it becomes a traditional circular buffer.

Sample algorithm

With some simple modifications in the traditional implementation presented previously, the above semantics may be easily supported. Note how the additional properties did not introduce significant overhead. Of course, this version will incur in additional memory allocation to support the buffer expansion, bu that’s inherent to the problem being solved.

Note that the above algorithm will allocate each element in the list individually, but in sensible situations it may be better to allocate additional space for the overflow area in advance, to avoid potentially frequent reallocation. In a situation when the rate of consumption of elements is about the same as the rate of production, for instance, there are advantages in doubling the amount of allocated memory per expansion. Given the way in which the algorithm works, the previous ring area will be exhausted before the mixed buffer becomes circular again, so with a constant rate of production and an equivalent consumption it will effectively have its size doubled on expansion.

UPDATE: Below is shown a version of the same algorithm which not only allows allocating more than one additional slot at a time during expansion, but also incorporates it in the overflow area immediately so that the allocated space is used optimally.

This blog post presented an algorithm which supports the expansion of circular buffers while preserving most of their key characteristics. When not faced with an overflowing buffer, the algorithm should offer very similar performance characteristics to a normal circular buffer, with a few additional instructions and constant space for registers only. When faced with an overflowing buffer, the algorithm maintains the FIFO property and enables using contiguous allocated memory to maintain both the original circular buffer and the additional elements, and follows up reusing the full area as part of a new circular buffer in an attempt to find the proper size for the given use case.

One more Go library oriented towards building distributed systems hot off the presses: govclock. This one offers full vector clock support for the Go language. Vector clocks allow recording and analyzing the inherent partial ordering of events in a distributed system in a comfortable way.

The following features are offered by govclock, in addition to basic event tracking:

Compact serialization and deserialization

Flexible truncation (min/max entries, min/max update time)

Unit-independent update times

Traditional merging

Fast and memory efficient

If you’d like to know more about vector clocks, the Basho guys did a great job in the following pair of blog posts:

The following sample program demonstrates some sequential and concurrent events, dumping and loading, as well as merging of clocks. For more details, please look at the web page. The project is available under a BSD license.

ZooKeeper is a clever generic coordination server for distributed systems, and is one of the core softwares which facilitate the development of Ensemble (project for automagic IaaS deployments which we push at Canonical), so it was a natural choice to experiment with.

Gozk is a complete binding for ZooKeeper which explores the native features of Go to facilitate the interaction with a ZooKeeper server. To avoid reimplementing the well tested bits of the protocol in an unstable way, Gozk is built on top of the standard C ZooKeeper library.

The experience of integrating ZooKeeper with Go was certainly valuable on itself, and worked as a nice way to learn the details of integrating the Go language with a C library. If you’re interested in learning a bit about Go, ZooKeeper, or other details related to the creation of bindings and asynchronous programming, please fasten the seatbelt now.

Basics of C wrapping in Go

Creating the binding on itself was a pretty interesting experiment already. I have worked on the creation of quite a few bindings and language bridges before, and must say I was pleasantly surprised with the experience of creating the Go binding. With Cgo, the name given to the “foreign function interface” mechanism for C integration, one basically declares a special import statement which causes a pre-processor to look at the comment preceding it. Something similar to this:

// #include <zookeeper.h>
import "C"

The comment doesn’t have to be restricted to a single line, or to #include statements even. The C code contained in the comment will be transparently inserted into a helper C file which is compiled and linked with the final object file, and the given snippet will also be parsed and inclusions processed. In the Go side, that “C” import is simulated as if it were a normal Go package so that the C functions, types, and values are all directly accessible.

When the C function is used in a context where two result values are requested, as done above, Cgo will save the well known errno variable after the function has finished executing and will return it wrapped into an os.Errno value.

Also, note how the C struct is defined in a way that can be passed straight to the C function. Interestingly, the allocation of the memory backing the structure is going to be performed and tracked by the Go runtime, and will be garbage collected appropriately once no more references exist within the Go runtime. This fact has to be kept in mind since the application will crash if a value allocated normally within Go is saved with a foreign C function and maintained after all the Go references are gone. The alternative in these cases is to call the usual C functions to get hold of memory for the involved values. That memory won’t be touched by the garbage collector, and, of course, must be explicitly freed when no longer necessary. Here is a simple example showing explicit allocation:

Note the use of the defer statement above. Even when dealing with foreign functionality, it comes in handy. The above call will ensure that the buffer is deallocated right before the current function returns, for instance, so it’s a nice way to ensure no leaks happen, even if in the future the function suddenly gets a new exit point which didn’t consider the allocation of resources.

In terms of typing, Go is more strict than C, and Cgo-based logic will also ensure that the types returned and passed into the foreign C functions are correctly typed, in the same way done for the native types. Note above, for instance, how the call to the free() function has to explicitly convert the value into an unsafe.Pointer, even though in C no casting would be necessary to pass a pointer into a void * parameter.

The unsafe.Pointer is in fact a very special type within Go. Using it, one can convert any pointer type into any other pointer type in an unsafe way (thus the package name), and also back and forth into a uintptr value with the address of the memory referenced by the pointer. For every other type conversion, Go will ensure at compilation time that doing the conversion at runtime is a safe operation.

With all of these resources, including the ability to use common Go syntax and functionality even when dealing with foreign types, values, and function calls, the integration task turns out to be quite a pleasant experience. That said, some of the things may still require some good thinking to get right, as we’ll see shortly.

Watch callbacks and channels

One of the most interesting (and slightly tricky) aspects of mapping the ZooKeeper concepts into Go was the “watch” functionality. ZooKeeper allows one to attach a “watch” to a node so that the server will report back when changes happen to the given node. In the C library, this functionality is exposed via a callback function which is executed once the monitored node aspect is modified.

It would certainly be possible to offer this functionality in Go using a similar mechanism, but Go channels provide a number of advantages for that kind of asynchronous notification: waiting for multiple events via the select statement, synchronous blocking until the event happens, testing if the event is already available, etc.

The tricky bit, though, isn’t the use of channels. That part is quite simple. The tricky detail is that the C callback function execution happens in a C thread started by the ZooKeeper library, and happens asynchronously, while the Go application is doing its business elsewhere. Right now, there’s no straightforward way to transfer the execution of this asynchronous C function back into the Go land. The solution for this problem was found with some help from the folks at the golang-nuts mailing list, and luckily it’s not that hard to support or understand. That said, this is a good opportunity to get some coffee or your preferred focus-enhancing drink.

The solution works like this: when the ZooKeeper C library gets a watch notification, it executes a C callback function which is inside a Gozk helper file. Rather than transferring control to Go right away, this C function simply appends data about the event onto a queue, and signals a pthread condition variable to notify that an event is available. Then, on the Go side, once the first ZooKeeper connection is initialized, a new goroutine is fired and loops waiting for events to be available. The interesting detail about this loop, is that it blocks within a foreign C function waiting for an event to be available, through the signaling of the shared pthread condition variable. In the Go side, that’s how the call looks like, just to give a more practical feeling:

// This will block until there's a watch available.
data := C.wait_for_watch()

As you can see, not really a big deal. When that kind of blocking occurs inside a foreign C function, the Go runtime will correctly continue the execution of other goroutines within other operating system threads.

The result of this mechanism is a nice to use interface based on channels, which may be explored in different ways depending on the application needs. Here is a simple example blocking on the event synchronously, for instance:

Those were some of the interesting aspects of implementing the ZooKeeper binding. I would like to speak about some additional details, but this post is rather long already, so I'll keep that for a future opportunity. The code is available under the LGPL, so if you're curious about some other aspect, or would like to use ZooKeeper with Go, please move on and check it out!

Continuing the sequence of experiments I’ve been running with the Go language, I’ve just made available a tiny but useful new package: gommap. As one would imagine, this new package provides access to low-level memory mapping for files and devices, and it allowed exploring a few new edges of the language implementation. Note that, strictly speaking, some of the details ahead are really more about the implementation than the language itself.

There were basically two main routes to follow when implementing support for memory mapping in Go. The first one is usually the way higher-level languages handle it. In Python, for instance, this is the way one may use a memory mapped file:

The way this was done has an advantage and a disadvantage which are perhaps non entirely obvious on a first look. The advantage is that the memory mapped area is truly hidden behind that interface, so any improper attempt to access a region which was already unmapped, for instance, may be blocked within the application with a nice error message which explains the issue. The disadvantage, though, is that this interface usually comes with a restriction that the way to use the memory region with normal libraries, is via copying of data. In the above example, for instance, the “root” string isn’t backed by the original mapped memory anymore, and is rather a copy of its contents (see PEP 3118 for a way to improve a bit this aspect with Python).

The other path, which can be done with Go, is to back a normal native array type with the allocated memory. This means that normal libraries don’t need to copy data out of the mapped memory, or to use a special memory saving interface, to deal with the memory mapped region. As a simple example, this would get the first line in the given file:

In the procedure above, mmap is defined as an alias to a native []byte array, so even though the standard bytes module was used, at no point was the data from the memory mapped region copied out or any auxiliary buffers allocated, so this is a very fast operation. To give an idea about this, let’s pretend for a moment that we want to increase a simple 8 bit counter in a file. This might be done with something as simple as:

mmap[13] += 1

This line of code would be compiled into something similar to the following assembly (amd64):

As you can see, this is just doing some fast index checking before incrementing the value directly in memory. Given that one of the important reasons why memory mapped files are used is to speed up access to disk files (sometimes large disk files), this advantage in performance is actually meaningful in this context.

Unfortunately, though, doing things this way also has an important disadvantage, at least right now. There’s no way at the moment to track references to the underlying memory, which was allocated by means not known to the Go runtime. This means that unmapping this memory is not a safe operation. The munmap system call will simply take the references away from the process, and any further attempt to touch those areas will crash the application.

To give you an idea about the background “magic” which is going on to achieve this support in Go, here is an interesting excerpt from the underlying mmap syscall as of this writing:

As you can see, this is taking apart the memory backing the slice value into its constituting structure, and altering it to point to the mapped memory, including information about the length mapped so that bound checking as observed in the assembly above will work correctly.

In case the garbage collector is at some point extended to track references to these foreign regions, it would be possible to implement some kind of UnmapOnGC() method which would only unmap the memory once the last reference is gone. For now, though, the advantages of being able to reference memory mapped regions directly, at least to me, surpass the danger of having improper slices of the given region being used after unmapping. Also, I expect that usage of this kind of functionality will generally be encapsulated within higher level libraries, so it shouldn’t be too hard to keep the constraint in mind while using it this way.

For those reasons, gommap was implemented with the latter approach. In case you need memory mapping support for Go, just move ahead and goinstall launchpad.net/gommap.

UPDATE (2010-12-02): The interface was updated so that mmap itself is an array, rather than mmap.Data, and this post was changed to reflect this.

It indeed does amaze me. It sometimes feels like we write code for theoretical perfect worlds.. “If the processor executes exactly in this order, and the weather is calm, this program will work.”. There are countless examples of bad assumptions.. someday I will come with some statistics of the form “Every N seconds someone forgets to check the result of write().”.

If you are a teacher, or a developer that enjoys writing snippets of code to teach people, please join me in the quest of building a better future. Do not tell us that you’re “avoiding result checking for terseness”, because that’s exactly what we people will do (terseness is good, right?). On the contrary, take this chance to make us feel bad about avoiding result checking. You might do this by putting a comment like “If you don’t do this, you’re a bad programmer.” right next to the logic which is handling the result, and might take this chance to teach people how proper result handling is done.

Of course, there’s another forgotten art related to result checking. It sits on the other side of the fence. If you are a library author, do think through about how you plan to make us check conditions which happen inside your library, and try to imagine how to make our lives easier. If we suck at handling results when there are obvious ways to handle it, you can imagine what happens when you structure your result logic badly.

Here is a clear example of what not to do, coming straight from Python’s standard library, in the imaplib module:

You see the problem there? How do you handle errors from this library? Should we catch the exception, or should we verify the result code? “Both!” is the right answer, unfortunately, because the author decided to do us a little favor and check the error condition himself in some arbitrary cases and raise the error, while letting it go through and end up in the result code in a selection of other arbitrary cases.

I may provide some additional advice on result handling in the future, but for now I’ll conclude with the following suggestion: please check the results from your actions, and help others to check theirs. That’s a good life-encompassing recommendation, actually.