Docker containers are great, and the Dockerfile build process is quite good, but there are pitfalls for newbies who come to Docker with a virtualization mindset. Docker containers are not light-weight VMs, because the abstraction happens at a much higher level. Docker is platform-as-a-service, not system-as-a-service. Here is a short list of issues I encountered migrating a couple of services from bare metal to Docker containers:

Docker containers have no login session, so there is no TZ, TERM, LC_ALL setting, and changing the system settings in /etc has no effect – this will surprise some users.

The UIDs of the container and the host system are shared (this will probably be fixed with UID/GID mapping soon), encouraging users to run all containers as root just to make images shareable. A security failure in the container isolation leads to privilege escalation.

The hostname is randomly generated on each container start (breaking for example carbon-daemon metric logging, which includes the hostname), requiring application patching to set fix imaginary hostnames for reproducible results.

Lack of resource isolation, for example with regards to I/O performance. A container utilizing I/O resources heavily can stall a filesystem sync operation in another container.

Some generic issues also arise:

Docker images carry a tag, which is an arbitrary label (with the special tag “latest” being the silent default). Many use version numbers for labels, but this is an illusion, as tags are not formally inter-related, so Docker does not know if there is a newer version of an image available. This raises the question when images should be rebuild, and how to get noticed of base image updates.

There is a private Docker registry container to replace the Dockerhub, but it does not include the automatic building of images from Dockerfiles and assets in git repositories.

aptly (by Andrey Smirnov) seems to be a swiss army knife for Debian/Ubuntu repositories. You can create (partial) mirrors, snapshot them, merge snapshots and push them to an apt-get’able repository. You can also upload packages to a local repository and snapshot and/or publish that, too. Andrey is rocking the Debian world with this, thanks a lot!

To illustrate the work-flows that this tool enables, here is an example that extracts firebird 2.5.1 and its dependencies from Ubuntu precise (12.04) and injects it into a published repository for trusty (14.04) installations (which have only firebird 2.5.2).

I am not sure why apt needs so much hand-holding, maybe there is an easier way.

This is super-easy to figure out, thanks to the excellent online help and superb diagnostic output. I’d suggest that AndreyThis repository adds bash tab completion to the commands for easier typing, as there are a lot of options and you might have many mirrors, snapshots and repositories.

If you have anything to do with maintaining a larger set of Debian/Ubuntu installations, check it out!

By the way, here is a condensed docker file that may be useful (aptly.conf is just the default config file in my case):

I have a private MediaWiki with the Semantic MediaWiki extensions, to keep some personal data. Wouldn’t it be nice to query that data from some other server, or from a web app? Semantic MediaWiki has a nice API that allows us to get data in JSON format. But we need to defeat the Same-Origin-Policy that protects our servers from evil code. JSONP is a well-known method that works, but only for anonymous requests on public wikis. Here is another approach that works with closed wikis, too.

MediaWiki

In LocalSettings.php, you need to allowCORS. You can use the *-wildcard or a list of allowed domains to query your MediaWiki instance:

$wgCrossSiteAJAXdomains = array( '*' );

Also, all API requests will take an origin parameter that repeats the domain from which the request came. This is very annoying, but the MediaWiki developers were concerned about implemented caching properly and efficiently, and this is the solution they came up with.

I am running the example code in a local server with

python -m SimpleHTTPServer 8000

so the origin parameter should be http://localhost:8000 and I don’t need to disable strict origin policy checking for file URIs in my browser.

AngularJS

Here you need to configure the http service provider to allow cross-domain requests. We also configure it to send credentials along with a request globally.

Not the prettiest of code. There is a lot of error handling missing, and so on. But it should get you going. The login process will store session cookies in the http service provider, which are sent in the following API requests. Of course, you can also query wiki pages with the parse action, etc., as normal.

The special content-type header prevents the pre-flight OPTIONS requests that are specified by CORS, and that are not supported by MediaWiki. If you see unhandled OPTIONS requests in your network log, then you need to take a closer look at the content-type header. I don’t know yet if that is a concern for downloading images from the MediaWiki server. If you try it, leave a comment!

Here are a couple of things I experienced using Cython to wrap my C++ library grakopp:

Assignment and Coercion

I couldn’t find a nice way to wrap boost::variant. Although the direct approach works, an assignment to the variant requires an unsafe cast, but that also adds the overhead of a copy. To work around this, I used accessor functions (requires changing the C++ implementation).

The operator= is not supported to declare custom assignment functions.

There is no other way to add custom coercion rules. The support for STL container coercion is hardcoded in Cython/Compiler/PyrexTypes.py. This also makes the builtin coercions less useful.

string coercion seems to be unhappy quite often, so you have to cast to string or char* even constant strings.

Imports

Relative cimports are not supported.

Unintuitively, a corresponding pxd file is automatically included, which can not be supressed. So renaming its imports with “as” in a cimport is not possible.

I had to deal with a proprietary software library that wouldn’t run on CentOS 6.5, because the library was compiled against a newer glibc (>= 2.14) while CentOS was running on glibc 2.12. Actually, there was only one symbol versioned later than 2.11, which was memcpy@2.14. It turns out that this is due to a well-known optimization (nice discussion with links [here).

Normally, one would install an appropriate version of the library and set LD_LIBRARY_PATH accordingly, but for libc that ain’t so easy, because you also need a matching runtime linker, and the kernel will always use /lib64/ld-linux-x86-64.so.2 or whatever is found in the INTERP program header of the ELF executable. You can run the matching linker manually, but this only works for the invoked processes, not its children. It’s a huge PITA, and short of a chroot filesystem or other virtualization I don’t know a good way to replace the system C library (if you know a way, leave a comment!).

Anyway, I decided to patch the binary. First, I checked the older version of the library, and saw that it required memcpy@2.2.5. So here we go:

We look up the specification for the layout of these .gnu.version sections. The plan is to copy the entry for GLIBC_2.2.5 into the entry for GLIBC_2.14, so that all references to version “9” go to glibc 2.2.5 instead of 2.14.

We can see the verneed (“version needed”) entry for libc here, together with three vernaux (“version needed auxiliary”) entries. Each vernaux entry consists of a 4-byte hash value for the version name (for faster comparison than strcmp, here 0x06969194 for GLIBC_2.14), 4 bytes flags and other information (such as the version number referenced by the .gnu.version section in the last byte), a 4-byte offset into the string table with the human-readable version string, and a 4-byte length for the entry (always 0x10).

We want to keep the indirectly referenced version number (“9″), so there are no duplicate entries, but copy the hash and string pointer values. Of course, the next offset stays, too. After editing with a hex editor, we have:

There is an extra memcpy@@2.14 reference, but no entry for it in the version table. I can get rid of that with strip --strip-unneeded, if I want to.

This seems to work for me just fine, and in fact it would have worked even if there wasn’t a GLIBC_2.2.5 entry already, but an entry for some other version. However, if there are more symbols to deal with, we might actually need to edit the actual symbol versions in the .gnu.version section (change the “9” into a “3” in this case), or do more complicated editing.

Virtualization technology is moving fast, and what used to be hot yesterday is as cold as ice today. There is a lot of material to digest, and a lot of documentation that seems somewhat relevant but can be out of date. Surely, this blog post will suffer the same fate, but nevertheless, here it is: A quick list of the most relevant and up to date technology that I could find to set up a small cloud.

Qemu has full support for KVM and virtio. As an extra bonus, it also supports legacy I/O virtualization as well as full system emulation (for example, running ARM systems on X86 hardware). It’s the glue that binds things together to a complete virtualized machine (system board and peripherals).

libvirtd is daemon to manage virtual machine instances and the underlying storage and network devices. This is the productive level for actual user interaction (while the above are building blocks used only indirectly through the libvirtd interface). libvirtd uses policykit for access control. In addition to the CLI tool virsh, there are many other tools that build on libvirtd, some graphical, such as virt-manager.

For a small personal cloud, libvirtd with its basic tools, command line and graphical, may be all you need. Of course, enterprisey users may require more complete management interfaces such as OpenStack etc.

As for the operating system images, it is possible to install from scratch, but that is very old style. Today, most vendors provide OpenStack images in qcow2 format, which can also be used with various cloud providers, which can be very convenient to use. These are basically pre-installed systems with the cloud-init utility that runs at boot time and looks in various places for special configuration files. Beside fancy provisioning servers, it is possible to use a simple ISO image with meta-data.yml and user-data.yml files.

And there you have it, a blue print for your own personal cloud. Once you picked up these basic building blocks, understanding integrated solutions such as OpenStack should be much less confusing. Or at least that’s what I am hoping.

FreeCad is a very promising free and portable CAD program. Unfortunately, it’s dependency chain is a bit messy, and building those libraries is not for the faint of heart. Normally, GNU/Linux distributions do a good job on that for you, but in Fedora, the packaging is not quite up to date. The included FreeCad 0.13 works, kinda, but there are crashes and bugs like missing text rendering in Draft mode. As FreeCad is progressing fast, it is useful to build the latest version, and here is how to do just that on Fedora 20.

First, you install the dependencies, except for coin, soqt and python-pivy.

I wrote a small introduction to Bayesian inference, but because it is pretty heavy on math, I used the format of an IPython notebook. Bayesian inference is an important process in machine learning, with many real-world applications, but if you were born any time in the 20th century, you were most likely to learn about probability theory from a frequentist point of view. One reason may be that calculating some integrals in Bayesian statistics was too difficult to do without computers, so frequentist statistics was more economical. Today, we have much better tools, and Bayesian statistics seems more feasible. In 2010, the US Food and Drug Administration issued a guidance document explaining some of the situations where Bayesian statistics is appropriate. Overall, it seems there is a big change happening in how we evaluate statistical data, with clearer models and more precise results that make better use of the available data, even in challenging situations.

HTML Tag Scope: If you mix HTML tags with wikitext, which is allowed for so-called “transparent tags”, MediaWiki will check the element nesting independent of the wikitext structure in a preprocessing step (include/Sanitizer.php::removeHTMLtags). Later on, when parsing the wikitext, some elements may be closed automatically (for example at the end of a block). The now-dangling close tag will be ignored, although it is detached from its counterpart by then:

<span style="color: red">test
this</span>

will result in:

test

this

while

test
this</span>

will result in:

test

this</span>

This can happen across a long part of the wikitext document, with many intermediate blocks, so the treatment of close tags has a wide context-sensitivity, which is generally bad for formal parsing.

Breaking the cssCheck: If a CSS style attribute contains character references to invalid Unicode code points, the page renderer terminates with a fatal error from include/normal/UtfNormalUtil.php::codepointToUtf8 called through include/Sanitizer.php::decodeCharReferencesCallback:

<span style="\110000">x</span>

leads to the fatal error:

Asked for code outside of range (1114112)

It’s a rare chance to see an uncaught exception to leak through to the user, and could be avoided by calling include/Sanitizer.php::validateCodepoint first and falling back to UTF8_REPLACEMENT.

Update:I submitted a patch for this to MediaWiki’s code review platform.

HTML attribute junk: You can write just about anything (except <, > or />) in the attribute space of an HTML opening tag, and MediaWiki will ignore it. This even includes a signature, like in the following example:

As long as attributes are separated from junk by whitespace, they are preserved (such as the style attribute above).

Missing block elements: You can avoid generation of paragraph elements (<p>) around inline text by inserting an empty <div style="display:inline"> element on the same line. If you drop the style attribute, the text will be broken in two paragraphs by the browser, tough, and the text before and after the div will not connect to preceding or following inline elements.

Table header data synonymity: In a table, after a !, the table data cell separator || is synonymous with the table header cell separator !!:

{|
! header1 || header2
|}

yields

header1

header2

with two table headers. The opposite does not work, though:

{|
| data1 !! data2
|}

yields

data1 !! data2

Note that this example also introduces a non-breakable space character after “data1″, because MediaWiki interprets the following exclamation mark as french interpunction.

Using indent-pre were it is not allowed: In some places, indent-pre (creating <pre> elements by indenting text lines with a space) is disallowed for compatibility. This affects <blockquote>, <p>, <li>, <dt>, and <dd> elements, and also prevents you from creating new paragraphs and <br/> elements with empty lines. The restriction is only active up to the first block level element, though, so it is easy to avoid it: