Tuesday, December 18. 2012

Recently, I've been programming more and more in the Go programming language. Since my editor of choice has been vim for the last 10 years or so, I use vim to edit my Go code, as well. Of course, when I'm programming, I want my editor to support me in useful ways without standing in my way. So I took a look how I could improve my vim configuration to do exactly that.

The first stop is in the Go source tree itself. It provides vim plugins for syntax highlighting, indentation and an integration with godoc. This is a start, but we can go a lot further.

If you want auto-completion for functions, methods, constants, etc., then gocode is for you. After installation (I recommend installation via pathogen), you can use auto-completion from vim by pressing Ctrl-X Ctrl-O.

But with auto-completion, we can go even further: with snipmate, you have a number of things like if, switch, for loops that you can auto-complete by e.g. typing "if" and pressing the tab key. You can then continue pressing tab to fill out the remaining parts.

Another feature that I wanted to have as well was an automatic syntax check. For this task, you can use syntastic. But better configure it to passive mode and activate the active mode only for Go, otherwise other filetypes are affected as well. syntastic will call gofmt -l when saving a Go source file and mark any syntax errors it finds. This is neat to immediately find minor syntax errors, and thanks to Go's fast parsing, it's a quick operation.

Last but not least, I also wanted to have some features that aren't specific to Go: first, a file tree (but only if I open vim with no specific file), and second, an integration with ack to search through my source files efficiently and directly from vim. As a file tree, I found nerdtree to be quite usable, and as ack integration, I used this simple configuration snippet:

Tuesday, May 8. 2012

Recently, I had quite a lot of free time to do stuff on my own. I'm currently unfit for work, and have been for over 2 months now, while I'm being treated for depression (don't worry, it's getting better).

To get myself going and get back on my feet, I started taking playing with technologies that I had recently read about, and built some simple things with them. As usual, I publish my works as open source. meta.krzz.de is one of these things. It's a meta URL shortener that will submit your URL shortening request to a number of URL shorteners in parallel, in display you the shortened URLs as they are received. See here for the source.

The interesting thing about this project is that it's essentially my first project involving AJAXy things and a shiny user interface (I used Bootstrap for the UI). I never really liked JavaScript, due to quite a few awkward language properties. Instead, I tried out Dart, a new language by Google that is meant as a replacement for JavaScript and currently provides a compiler to translate Dart code to JavaScript.

Dart is rather conventional. It obviously reminds one of Java, due to a very similar syntax. With its class system, it also provides compile-time checks that rule out a lot of errors beforehand. In addition, the generated code is meant to be quite portable, i.e. it should work with all/most modern browsers. Even though the documentation is terrible, I still found my ways a lot quicker than with my previous attempts with JavaScript. I got code running pretty quickly, had some first prototype, and from there on, it went quite fast. So Dart got me kind of hooked. Client-side web programming suddenly seemed easy.

So the next thing that I tried out was building a location hash router. The location hash is the part after the # sign that you may sometimes see in URLs. In recent times, it got more popular to keep the state of so-called single-page apps, web applications where all communication to the server is done using AJAX and the page is modified not by reloading but by modifying the DOM tree instead. You can find the source for hashroute.dart here.

The idea behind it is that you define a number of path patterns, Sinatra-style, e.g. /foo/:id, and callbacks that will be called whenever the location hash is set to a value that matches the pattern. When you look at the source, this is surprisingly simple and easy to understand.

And so I tried out another thing in Dart. David Tanzer recently told me about his idea of writing a blackboard application, where one user can draw something (his idea was using a tablet computer) and others can watch this in real time in their browser. After having a rough idea how I could implement that, I started a prototype with Dart on the client-side and Go on the server-side. You can find the source for the prototype here. The drawing is done on a HTML5 <canvas>. The Dart client not only allows the drawing, but records the coordinates for every stroke, and sends them to the server using WebSockets. The server takes these coordinates and forwards them to all listening clients, again via WebSockets. The "slave" is equally simple: it opens a WebSocket to the Go server and draws all incoming strokes according to the submitted coordinates. Currently, there is no authentication at all, and it's only a very early prototype, but this is not only a piece of code that (again surprisingly) simple, but also an idea that could evolve into a useful tool.

In total, I'm very satisfied with how straightforward Dart makes it to write client-side web applications. I've been pretty much a n00b when it came to the client side of the web, but with Dart, it feels accessible for me for the first time.

Sunday, October 9. 2011

A bit more than a year ago, I started baconbird, my first attempt at a Twitter client that fits my requirements. I worked on it for some time, and it reached a level where it was usable, and where I was pretty happy with it. But due to a few poor choices, chances for further development diminished, and I basically halted the project, and only used the software. And after using it for a few months, more and more principal defects, practically unfixable, came up. Just too much began to suck, so earlier this week, I decided to throw baconbird away and reboot the whole project.

But first things first: my first poor choice was using Perl. Sure, Perl provided me with all the tools to come to meaningful results really quickly, through the vast amount of ready-to-use modules in the CPAN archive. And that's what's really great about it. What's not so great about Perl are the ideas of concurrency. You basically have to resort to event-based, async-I/O programming, which is not only painful by itself, but also extremely painful to integrate with my widget set of choice, STFL. And threads in Perl... don't do it, that's what everyone says. That meant that I couldn't do any of the Twitter communication asynchronously in the background, but had to do it in my primary thread of execution, essentially make it part of the subsequent calls of the input loop, if I wanted to retain sanity and a certain level of productivity. And that made baconbird really slow to use, and when Twitter's API was offline, baconbird practically hung. That made baconbird just a PITA to use, and I had to change something about it.

Also, in the last few months, I played more and more with Go, a programming language developed by several people at Google, many of which were formerly involved in projects like Unix or Plan 9. I became more fluent in it, and I really liked the concepts of Go about concurrency, parallelism and IPC. Also, I played with Go's foreign function interface, cgo, and built a wrapper around STFL so that it could be used with Go. So, essentially, Go would provide me with a few language features that I totally missed in Perl, while it wouldn't provide me with other niceties, like a CPAN-like repository. But finally, I decided to just bite the bullet, at least build upon an existing OAuth implementation for Go, and started my second incarnation of an STFL-based Twitter client, this time written in Go. And, after some initial prototyping that started last Thursday, I put more work into it this Saturday and Sunday, and today I reached the point where I had some code to show that wasn't completely ugly, had some structure and the basic outlines of an architecture, and that obviously didn't suffer from all the negative things that I hated about baconbird.

The overall structure is simple: three goroutines are started, one for the model (which does all the interaction with the Twitter API), one for the controller (which, now that I think about it, may be obsolete, because it doesn't do that much except for receiving new tweets and sending them to the UI), and one for the user interface (which does all the non-interactive redrawing and UI fiddling). Reading the user input is done in the main execution thread of the program, but no parsing is done and all input is given directly to the user interface goroutine. As already suggested, these three goroutines communicate using channels. When the model receives new tweets, it sends them to the controller, the controller puts them into a list of all tweets and then forwards them to the user interface, and the user interface inserts them into the list of tweets that are displayed to the user. And when a new tweet is sent, the text for it is read by the user interface, but is then sent to the model (again via a channel; i.e. the application won't block while the tweet is being transmitted to Twitter), and when sending the tweet is finished, the Twitter API returns the tweet object, which is then again handed over to the controller and the user interface (so that you can immediately see your own tweet in your home timeline as soon as it's reached Twitter). That's basically all the functionality that's there, but there's a lot more to come.

And before I forget it: the code is already out there. Visit gockel ("Gockel" means "rooster" in German, which is a type of bird [Twitter, bird, you know], and I chose it because the first two letters also give a hint about the programming language in which it is being developed), and read the source code, which I consider to be rather simple and easy to understand. A lot more will change, new features, everything will be better, and hopefully soon, gockel will be ready to achieve world domination. No, seriously: this reboot finally gives me the ability to (hopefully) implement the Twitter Streaming API with only minimal amounts of pain.

Tuesday, July 5. 2011

In the last few months, from time to time I experimented with the programming language Go, a new systems language released only in 2009 and developed by people like Ken Thompson, Rob Pike and Russ Cox at Google. While still staying relatively low-level, it adds a garbage collector, and simple object/interface system, and the mechanisms of goroutines and channels as means of concurrency and inter-process communication (or should I say inter-goroutine communication).

In the last few days, I found more time to experiment with Go, and finally fully got the hang of the object system, and I must say that I really like it, both for the simplicity and the expressiveness. Objects in Go feel so simple, so natural, and at the same time powerful, with very little syntactic overhead. So that's what I want to write about today.

Let's start with a simple example:

Here we created a new type Vehicle with one private member "speed", and a method Speed() that returns the current speed. Of course, a vehicle all by its own isn't very useful, so we define the minimal requirements for a vehicle that's actually usable:

Here we define an interface named Drivable that defines the required methods for an object to be drivable, in our case, it must be able to accelerate, brake and show the current speed. Based on this, we construct our first actually usable vehicle:

And voila, we have our first Car that is also Drivable. What did we exactly do here?

We created a new type Car that is a struct, and embedded the type Vehicle: instead of adding a named member, we added an unnamed member, only specified by the type. This embedding also makes all available methods for this type available to our new type, i.e. a method Speed() is now also available for our type Car.

In addition, we implemented two new methods, Accelerate() and Brake(), in order to match the interface Drivable. And last but not least, we implemented a function to create a new Car.

Now, let's create another type of Drivable vehicle, let's say a boat:

So far, so uninteresting. The implementation of the boat is basically the same as the car, so nothing new.

But now I want to go a step further and implement and amphibious vehicle that is both a car and a boat. Let's just do that:

But that doesn't quite work out, because when we try to compile it, we will see an error message like this:

Since we embedded both Car and Boat into Amphibian, the compiler can't decide which Accelerate() method it shall use, thus communicating it as ambiguous. Based on that, it also says that it can't create an object of type Amphibian as Drivable because it has no proper Accelerate() function. This is the classic problem of diamond inheritance that we need to resolve here, not only for Accelerate(), but also for Brake() and Speed(). In the case of the amphibious vehicle, we do this by returning the right speed depending on whether the vehicle is in the water or on land:

And of course, the object perfectly works as Drivable:

So, what I just showed you are the capabilities in what Go modestly calls embedding but that feels like multiple inheritance and is really simple at the same time.

For a more complete view on Go and its features, I recommend the official Go documentation, and especially "Effective Go", a document that introduces the most important Go features and shows how to write idiomatic Go code. The complete example source code can be found here.

Saturday, January 15. 2011

Through this tweet, I got aware of redo, a build system with an interesting approach. It's based on some ideas that djb wrote down on his website, but apparently never published any code. This blog posting even goes as far as claiming that redo could be the Git of build systems. It argues that Git replaced most other widespread open source VCS because it's simple and flexible, and redo will go the same way for the same reasons.

I'm a person who is usually open to new approaches regarding development/build infrastructure, and so I evaluated redo by converting the build system of newsbeuter to redo. I chose newsbeuter not only because it's my most successful project so far, but also because it is typical for how I structured projects and their build system in the last few years, and it's non-trivial: some .o files are packed into .a files, two different binaries are built that share a common library, .h files are created from other text files, and it's currently all done by a single, non-recursive Makefile. The hypothesis is that if I can use redo to build newsbeuter without any hassles or weird hacks or workarounds, then it's good enough for what I need.

The actual conversion took me about an hour, maybe a bit more. Documentation could answer all my questions, except for those things that I stumbled upon that I consider to be bugs (I will report them later, don't worry). It was really hassle-free and straightforward, and at no point I was ever confused because I hadn't fully digested a new but poorly explained concept yet. But I think that the claim of redo being the Git of build systems can't be held up.

And that is because of one major thing: redo tries to do things "right", and it does so by implementing one certain approach and one certain approach only: redo doesn't support non-recursive builds. If I want a rule to apply to e.g. .cpp files in the root directory and in the src subdirectory, I will have to duplicate this rule. Since redo rules are basically shell scripts, this can be easily deduplicated, but a problem arises when e.g. the command line to compile .cpp to .o files contains -I options to reference files in the local project directory. For the .cpp file in the root directory, such an option has to be "-Iinclude", while for a .cpp file in the src subdirectory, it has to be "-I../include". I don't see anything in redo's documentation that the author has thought of that situation in any way, he even explicitly states that "There is no non-recursive way to use redo."

And that's the fundamental difference to make (the make I use is GNU make, just to make that clear), where I as a developer can choose to use make either in a recursive or non-recursive fashion, whichever I prefer. redo really lacks flexibility here. I would probably be able to solve some of my issues by restructuring the source tree to work around these deficiencies, but why would I want to do that? Tools are supposed to work for me, and not the other way around, they should be flexible enough to be adaptable to my needs, and I shouldn't need to adapt my existing project to make it fit to what redo expects.

So, redo, from a conceptual point of view, has a really good and simple approach (very djb-y), and I'm sure it's an excellent tool for new projects, but for existing projects that already use make in a non-recursive fashion, it would a maintenance PITA. And that's why I conclude that redo in its current conceptual state will never be the Git of build systems. make is still more flexible, and even though it has its flaws, it's still good enough for most people, and also a de-facto standard.

Monday, January 10. 2011

In this posting (and also this one), Su-Shee pleads for a more vocal and active promotion/advocacy. While I know how nice it can be to use a piece of software that makes people envious, I just want a quiet, hype-free, pragmatic, down-to-earth knowledgable community, and I will explain why.

Some years ago, summer or fall 2002, if I recall correctly, there was this new object-oriented programming language that everybody was talking about, that was quite obscure but really cool because you could write compact, easy-to-read, object-oriented code, in a fashion not unlike Perl. That programming language was Ruby. Having had a background of Pascal, C, and some C++ and Perl, I was astonished by the expressiveness and flexibility of the language, without losing readability. I used Ruby for quite some time, worked on some silly open source project with it, I used it for scripting at work (until my then-boss prohibited the use of any programming language for which no other developer in-house had any knowledge), I wrote daemons that formed the glue between email and MMS systems, I even used it as exploratory prototyping language for stuff that I would later rewrite in C. And then came Rails.

Rails was hyped as the greatest invention since sliced bread, the be-all end-all of web development. As somebody who had previously only known CGI.pm and cgi.rb, this indeed looked intriguing. I had a closer look, bought a book, played with it, and found it quite OK. Until I wanted to do things differently than what the usual tutorials had in mind. Useful documentation about the greater context and overall concepts was sparse (be aware, this was pre-1.0 Rails, things may have changed a lot since then), and I felt constricted by too many things in Rails (this has most likely been addressed in later releases; I haven't bothered to look back since then, though). So many things that were advertised as awesome turned out to be nice, but not that impressive on closer inspection.

And this is exactly what I don't want to do, namely awakening false hope in people about awesome features of $SOFTWARE, and I think an overly optimistic presentation of a system's features can easily lead to exactly that. In fact, the only screencast that didn't disappoint on closer look was Google's launch of the Go programming language in 2009, but that only as a side note. Self assurance is nice at all, but in my experience, there's only a fine line between self assurance and misrepresentation.

Another aspect of the Rails hype was the complete turn-over of a great portion of the Ruby community. The usual community and support channels were flooded with people interested in Rails, Rails became synonymous with Ruby, the signal-noise ratio drastically became worse. Some people even tried to actively recruit me for open-source Rails projects because I "knew Ruby". I declined, because the web applications aren't my area of interest at all (even today, the only things I have to do with web is that I hack on stuff like Apache and PHP [the interpreter] for a living; still, no web development).

Yeah, the community turn-over. Soon, people with big egos and great self assurance surfaced, dubbing themselves rockstars or ninjas or similar. I always found these to be acts of total douchebaggery, because, in the end, they all only cooked with water, anyway (a notable exception was why the lucky stiff). These were the people with great self assurance and a total lack of self-criticism or reflection on one's own work. It's not the kind of people whose software I would want to use, because too often my experience was that some library, framework or tool was announced with big words, which then practically always turned out not to be great at all, sometimes even barely usable or totally buggy.

And that brings me back to the original topic of Perl. Exactly the things that I learned to despise about the culture revolving around Ruby and Rails are something that I haven't experienced in the Perl community at all so far. Libraries, framework, documentation and people had more time to mature, probably, while still being down-to-earth. I can expect good documentation, functioning software, friendly people and most importantly accurate words in announcements and discussions.

Through my current employment, I found the way back to Perl, using it at work and also for some personal stuff. I recently even started a new open source project where I productively use Moose for the first time. My code is probably not very expressive and may seem weird (I know, TIMTOWTDI), but at least I feel comfortable writing it. I'm fine with most things surrounding Perl, its eco-system and its communities, and so I don't want to see any kind of interruption and turn-over like I saw it with Ruby.

Baconbird, much like my previous successful project newsbeuter (an RSS/Atom RSS feedreader for those who haven't heard about it yet), targets people who tend to work on text terminals (using mutt, irssi, slrn, vim, etc. is a strong indicator for that), uses the fabulous STFL as its user interface library, and was created to scratch an itch. My itch in this case was Twitter's switch from basic authentication to OAuth, which had the negative side effect that it rendered all (authenticated) RSS feeds practically unusable, so I was stuck to using the web interface of Twitter, which I wasn't too happy about. Also, other Twitter clients were either slow, had weird user interfaces, or were simply CPU and memory hogs. And then there was the infamous Twitter XSS worm that affected quite a few people. All these things together brought me to the conclusion that I had to change something about that.

At first, I started hacking on another client, but soon I found out that the purely ncurses-based UI is virtually unmaintainable, and so I decided to do it right instead and write my very own thing. And that's how baconbird was born. The code itself is about 850 SLOCs of Perl code, the user interface is based on STFL, the Twitter backend uses Net::Twitter, and all the OO glue code in between uses the Moose object system. All in all it took me less than 2 weeks of my free time to develop the first version of it.

So, what features does baconbird offer? Not too much, so far, but enough for a start and to show it off. Of course, you can view the time line (i.e. all the tweets of you and the people you follow), your mentions, your direct messages, and searches. It reloads continuously, and takes care about the current rate limit as imposed by Twitter. You can post tweets, reply to tweets (it even tracks the status ID, i.e. in the web interface you can see proper "in reply to" links), retweet, send direct messages, reply to direct messages, and when writing tweets, you can even have your URLs shortened (I implemented integration with is.gd).

My personal favorite is the search feature, though, I can search for a certain word or hashtag, and I get a nifty live stream of the latest tweets on that. I can even switch to my timeline and switch back, and the live stream keeps on loading as soon as its view is active, until I switch to another view or start another search.

Baconbird is far from complete, though. I still need to implement subscription management, and everything else the Twitter API has to offer (as long as its applicable from a terminal application). I'm also glad to implement your feature requests. And last, but not least, if you like baconbird and to support its further development, I'd be happy about flattring it. You can also flattr this article, of course (</shameless-plug>).

Thursday, July 22. 2010

Introduction
In the beginning, there was char. It could hold an ASCII character and then some, and that was good enough for everybody. But later, the need for internationalization (i18n) and localization (l10n) came up, and char wasn't enough anymore to store all fancy characters. Thus, multi-byte character encodings were conceived, where two or more chars represented a single character. Additionally, a vast set of incompatible character sets and encodings had been established, most of them incompatible to each other. Thus, a solution had to be found to unify this mess, and the solution was wchar_t, a data type big enough to hold any character (or so it was thought).

Multi-byte and wide-character strings
To connect the harsh reality of weird multi-byte character encodings and the ideal world of abstract representations of characters, a number of interfaces to convert between these two was developed. The most simple ones are mbtowc() (to convert a multi-byte character to a wchar_t) and wctomb() (to convert a wchar_t to a multi-byte character). The multi-byte character encoding is assumed to be the current locale's one.

But even those two functions bear a huge problem: they are not thread-safe. The Single Unix Specification version 3 mentions this for wctomb, but not for mbtowc, while glibc documentation mentions this for both. The solution? Use the equivalent thread-safe functions mbrtowc and wcrtomb. Both of these functions keep their state in a mbstate_t variable provided by the caller. In practice, most functions related to the conversion of multi-byte strings to wide-character strings and vice versa are available in two versions: one that is simpler (one function argument less), but not thread-safe or reentrant, and one that requires a bit more work for a programmer (i.e. declare mbstate_t variable, initialize it and use the functions that use this variable) but is thread-safe.

Coping with different character sets
To convert different character sets/encoding between each other, Unix provides another API, named iconv(). It provides the user with the ability to convert text from any character set/encoding to any other character set/encoding. But this approach has a terrible disadvantage: in order to convert text of any encoding to multi-byte strings, the only standard way that Unix provides is to use iconv() to convert the text to the current locale's character set and then convert this to a wide-character string.

Assume we have a string encoded in Shift_JIS, a common character encoding for the Japanese language, and ISO-8859-1 (Latin 1) as the current locale's character set: we'd first need to convert the Shift_JIS text to ISO-8859-1, a step that is most likely lossy (unless only the ASCII-compatible part of Shift_JIS is used), and only then we can use to mb*towc* functions to convert it to a wide-character string. So, as we can see, there is no standard solution for this problem.

How is this solved in practice? In glibc (and GNU libiconv), the iconv() implementation allows the use of a pseudo character encoding named "WCHAR_T" that represents wide-character strings of the local platform. But this solution is messy, as the programmer who uses iconv() has to manually cast char * to wchar_t * and vice versa. The problem with this solution is that support for the WCHAR_T encoding is not guaranteed by any standard, and is totally implementation-specific. For example, while it is available on Linux/glibc, FreeBSD and Mac OS X, it is not available on NetBSD, and thus not an option for truly portable programming.

Mac OS X (besides providing iconv()) follows a different approach: in addition to the functions that by default always use the current locale's character encoding, a set of functions to work with any other locale is provided, all to be found under the umbrella of the xlocale.h header. The problem with this solution is that it's not portable, either, and practically only available on Mac OS X.

Alternatives
Plan 9 was the first operating system that adapted UTF-8 as its only character encoding. In fact, UTF-8 was conceived by two principal Plan 9 developers, Rob Pike and Ken Thompson, in an effort to make Plan 9 Unicode-compatible while retaining full compatibility to ASCII. Plan 9's API to cope with UTF-8 and Unicode is now available as libutf. While it doesn't solve the character encoding conversion issue described above, and assumes everything to be UTF-8, it provides a clean and minimalistic interface to handle Unicode text. Unfortunately, due to the decision to represent a Rune (i.e. a Unicode character) as unsigned short (16 bit on most platforms), libutf is restricted to handling Unicode characters of the Unicode Basic Multilingual Plane (BMP) only.

Another alternative is ICU by IBM, a seemingly grand unified solution for all topics related to i18n, l10n and m17n. AFAIK, it solves all of the issues mentioned above, but on the other side, is perceived as a big, bloated mess. And while its developers aim for great portability, it is non-standard and has to be integrated manually on the respective target platform(s).