If you’re having trouble using the ext4 support in u-boot to load or list files, your filesystem probably has a feature bit turned on that u-boot’s ext4 implementation doesn’t support. To fix this, run:

In 2017, it’s still way too hard to build software on Windows. I wanted to see how difficult it’d be to make a few changes to VirtualBox, for which I’d have to rebuild it from source. The version of VirtualBox you download as a binary from virtualbox.org isn’t the same one you get if you try to build from source, and I think this is where the problems start.

The first challenge is getting the source. I gave up after an hour of waiting for the source to download from http://www.virtualbox.org/svn/vbox/trunk. As I was going to be working from Git repo, I used BitBucket’s Subversion import to create a repo at https://bitbucket.org/voltagex/virtualbox-mirror.

The first thing you’ll notice about the VirtualBox build system is that the configure script for Windows is in VBScript, which is an unusual choice but it technically means you could get up and running on most Windows installs without too much effort. Unfortunately, this is where the easy part ends. I very quickly worked out that I wanted to use AppVeyor, mainly because their VM images are very well configured. I did try building on my own Server 2016 VM, but I couldn’t get the right version of the Windows SDK / DDK installed, or at least not in the paths that the VirtualBox build system expected them.

This leads me to the next reason why building VirtualBox on Windows absolutely sucks. It requires both an old version of MinGW and the Visual Studio 2010 compiler. Visual Studio 2010 was released on 12 April 2010, while MinGW 4.9.3 was released sometime in 2015, I think. SourceForge’s file browser is very hard to navigate.

I suspect choosing a 64 bit toolchain was my first mistake here, but we’ll get to that later.

I created an appveyor branch to work from, mainly so I could easily diff changes that I made to configure.vbs. When I started working on this, each build took about 20 minutes on AppVeyor, depending on where the build broke. You can see all the builds at https://ci.appveyor.com/project/voltagex/virtualbox-mirror/history, along with my terrible commit messages. Future Adam – please write better commit / build messages so that writing these blog posts will be easier. If anyone has any suggestions for technical note-taking, I’m all ears – but I think OneNote will be hard to beat.

AppVeyor’s build history shows 63-odd builds. I know I did a few more on AWS and various virtual machines, but I gave those up quickly – I think most of the issues were around not being able to find WinDDK even though it was installed.

Apparently it took me 4 builds even to get a sensible error message out of the configure script

The hint there is that the error complains about a suitable MinGW installation. At that point it’s actually found some of the required files, but buried in the configure.log is the real reason it’s failing

I can’t remember whether at this point I realised that the version of MinGW I had installed was too new, but it definitely wasn’t the only problem. Skipping ahead a few builds shows I was copying files into the lib64 directory with xcopy, betting that the build system was looking in an old or obsolete path.

I don’t know why the configure script hides most of the useful information in configure.log – for example MinGW-w64 version '5.3.0' is not supported (or configure.vbs failed to parse it correctly).

Anyway, after dropping the MinGW version back I was able to progress past this point. I still needed to copy files from lib to lib64.

For the next little while it was as simple as adding dependencies in and rebuilding. The builds only took 5 or so minutes to fail so it wasn’t too bad, although it was very frustrating to wait and then find out I’d messed up a 7z command line, like in https://ci.appveyor.com/project/voltagex/virtualbox-mirror/build/1.0.15.

Things get interesting when your software depends on OpenSSL on Windows. I’ve never actually built it from source myself and I’m afraid of the amount of whisky I’ll need when I eventually try. Security implications be damned, by 15 builds in I’d ‘cheated’ and downloaded some pre-built libraries from https://www.npcglib.org/~stathis/blog/precompiled-openssl/ and included them in the build. Unfortunately this involved renaming the dlls to the ‘old’ OpenSSL names – apparently sometime in 2016 the OpenSSL project changed the names, breaking decades of assumptions.

curl is another story. libcurl – despite being one of the most commonly used libraries anywhere (it’s probably in your phone, your car and maybe even your lightbulb), there were no precompiled binaries for Windows that I could find.

A slight diversion to work out how to build curl, then I guess. This means (in theory) linking to OpenSSL again, too. Luckily someone else had done it for me and I could lean on https://github.com/blackrosezy/build-libcurl-windows. This has some pretty neat batch scripting in it and soon I had https://github.com/baxterworks-build/build-libcurl-appveyor for myself. I should have really learned how to use GitHub Releases or BinTray but I think at this point this silly project had consumed enough of my evening and I threw a zip up on my NeoCities site and carried on.

20 builds later, I’d passed the trials of configure.vbs, and in theory I was ready to build VirtualBox.

Execute env.bat once before you start to build VBox:
env.bat
kmk

This in itself proved a challenge because I couldn’t work out how to set the path to kmk so that AppVeyor’s build system would find it. I believe that AppVeyor is running everything in a single PowerShell instance by default and every command runs in a ‘child’ process so I couldn’t work out how to get the variables where they needed to be.

Sidenote: kmk is part of kBuild, which is not Kbuild, the Linux Kernel build system. Have a look at https://trac.netlabs.org/kbuild/wiki/kBuild and then just fucking use cmake like everyone else. Anyone building on Windows will thank you for it.

A couple (9) more failed builds later I’d hacked the batch files enough to start actually building VirtualBox! Is almost 3 days to set up a build system some kind of record? I’m sure even a new Microsoft employee could kick off a Windows build faster than this. Come on.

I didn’t expect it to work first time but I definitely didn’t expect this error:

build.bat
Config.kmk:2773: C:/projects/virtualbox-mirror/out/win.amd64/release/DynamicConfig.kmk: No such file or directory
Config.kmk:3503: *** You need to enable code signing for a hardened windows build to work.. Stop.
Command exited with code 2

I think on Linux there’s a `–disable-hardening` option for configure, but this didn’t exist for the Windows build – there’s another good reason to use a single build system for all builds. Looking at this commit you’d think it’s as easy as flipping a switch, but apparently not. I could not disable the code signing or hardening options no matter what I tried. The real solution, of course, is to disable the error message itself.

Onwards and… upwards? The next error to sort out was a missing python.exe, which turned out to be pretty boring – someone had coded it to only ever look in /bin/, which won’t help you much on Windows. With that fixed, it should be smooth sailing, right? Yeah, nah.

35 builds in, I got completely stuck. For some reason openssl.h wasn’t being found, even though it wasn’t a problem before. Luckily, AppVeyor let you RDP into a build machine during the build itself to inspect the state of the machine. This is pretty amazing – and I’m still on a free account!

By adding iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/appveyor/ci/master/scripts/enable-rdp.ps1')) into AppVeyor you’ll get an IP and credentials printed out in the build log. On a free account, I think the build logs are public so be careful with this.

From memory I was trying to use SysInternals Procmon (which every dev should learn to use) to work out where kBuild was expecting to find openssl.h. This turned out to be much too slow and created huge trace log files – 600mb+. I’m very glad I’m no longer on DSL based broadband. I think I tried a few different filters before giving up and looking for a way to run procmon non-interactively. Luckily that’s already been thought of, if you runprocmon /Quiet /Minimized /BackingFile virtualbox-build.pml you can start procmon automagically. I did have to ask for help for this one as it seemed to ‘block’ the build. AppVeyor support is awesome and came through with a fix, even opening a pull request on my repo (tl;dr start commands in PowerShell jobs and they won’t interrupt anything – similar to a bash subshell).

After fiddling with paths and learning about all the different hooks AppVeyor has (I needed to be able to retreive files from the build but the artifacts section normally isn’t run on a ‘failed’ build), I managed to get 628mb of 7zipped PML files off the build host. If you ever need to debug to this level, I highly recommend doing something like this. This was build number 44, clocking in at over 56 minutes, which is a bit of a problem as AppVeyor’s free plan has a 60 minute limit. Over the next little while I tried different paths for OpenSSL and talked to one of the VirtualBox developers on IRC.

This developer told me that internally (for the ‘commercial’ build of VirtualBox) they use a different build process which unfortunately couldn’t be shared due to licencing concerns. Sigh. At least a few of the things in the build system made a bit more sense (it looks like an internal checkout of the source contains most of the build tools).

It seems like the include path just got too long and the build system doesn’t see all of it, or something along those lines – the fix was to move openssl.h to virtualbox\include. After an evening and a half (?) I was off and moving again.

You should definitely include any references you’ve used in troubleshooting in code comments and commit messages. I don’t think I would have got much further without https://forums.virtualbox.org/viewtopic.php?f=10&t=61510, which described the exact issue I was having at this point and a solution – copying a file with a specific name. I’m not sure whether I caused more issues here by mixing 32 and 64 bit libraries, but the build continued.

By build number 60, with a commit message of ‘Sigh’, I’d definitely hit the 60 minute limit of the free AppVeyor plan. Suprisingly, AppVeyor staff increased my time limit to 90 minutes when I asked. Thanks, Ilya!

At this point I was tracking down the cause of [00:16:37] kmk_builtin_redirect: _spawnvpe(/bin/nm.exe) failed: No such file or directory, which I probably should have recognised from the Python failure earlier – but which nm.exe did it want? Visual Studio? MinGW? 32 or 64 bit? It took multiple gigs of ProcMon logs to work this out, and I’m still not sure I chose the right one.

It was at build 62 I find myself completely defeated – VirtualBox appears to build some SSL certificates into a binary for some unknown reason and whatever was generating byte arrays generated bad code.

For a project I’m working on I needed to be able to rewrite the content of a page as it’s sent back to me from a remote server (via proxy_pass in nginx).

I’ve been using Alpine Linux more as part of very small Docker containers, so I started looking into how nginx is built there. After stumbling around a bit, I found that Alpine uses a ports-like system called aports. I copied the nginx port into a new folder and began working on a Dockerfile.

abuild, the Alpine package build tool seems to be designed to build the whole tree, and the wiki talks about an abuild build command that apparently does everything except build the package. Sigh. I eventually cobbled enough together from mailing list posts and guesses to figure out that the command I wanted was abuild -r -fF -c -P /packages

This is where it all went a bit wrong. The module I needed was https://github.com/yaoweibin/ngx_http_substitutions_filter_module, which I added into the APKBUILD script and faked up a sha256sum. abuild doesn’t seem to check this at all, but won’t let the build continue if it’s not listed.
No matter what I did, I got the following failure:

Now, earlier in the build log I could see the module being built, so I couldn’t understand why this was going wrong.

After more stumbling around I found that this was an “old style” module – that couldn’t be built as a dynamic module like the rest of the ones in the script. I wonder why nginx’s build system doesn’t make more noise about this?

The really strange part of this is that if you try the same thing on a Debian or Ubuntu system, the module will build fine. At this point I’d wasted a lot of time trying to build on Alpine, so I switched my baseimage over to use debian:stretch-slim, installed nginx-full, which already has the module I need and continued with my project.

It’s not all bad though, I created a couple of good things while trying to get this to work. One is a debug version of abuild, which was as simple as changing #!/usr/bin/ash -e to #!/usr/bin/ash -xe. It’s up on docker as voltagex/alpine-builder-debug if you really need it. This is helpful if you’re trying to build packages for Alpine and things aren’t working as you expect. Be warned though, this produces a lot of output.

Just a quick post to catalogue some of the errata they haven’t posted about yet.

The DNS (configuration) is a lie

The DNS settings in LuCI are almost completely non-functional – with no indication, because the Omnia is using some combination of Unbound and Knot for DNSSEC. This means I’m currently getting less features out of the resolver on my router than I was running dnsmasq on FreeBSD.

Oops. One should not drink and adb. This is the result of flashing a newer system image on a Motorola X Style (XT1572) without wiping the partition first. The system booted and appeared to have been upgraded from 6.0.0 to 6.0.1, but WiFi no longer functioned.

Digging a little further this appears to be some kind of mismatch between the WiFi driver and the firmware on the system partition. Oops again.

Luckily I was still able to back up my files to the SD card (tell your manufacturer you want expandable storage!) and reflashing a “stock” firmware image also restored my ability to use WiFi.

Stock is in quotes because annoyingly, Motorola don’t provide factory images for the Style, only the Pure – which in theory should work but I really don’t want to do something serious like mess up EFS or the radio firmware.

Therefore, I’ll have to hope that the firmware at http://dl.prazaar.de/?dir=Android/XT1572/Factory/6.0.1/MPHS24.107-58-5_September_2016 is unmodified and genuine.

One more sidenote: Flashing a factory image and relocking the bootloader doesn’t remove the annoying “bootloader was unlocked” warning when turning on the phone.

ZFS is one of the most robust and fault tolerant filesystems around. It can also be fairly tricky to set up and maintain so I’m writing at least one blog post based on my experiences setting up my NAS. I definitely did make some mistakes when I configured my storage but it’s served me well for a while now. Unfortunately, other parts of the motherboard I selected have been a bit temperamental. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213877 for details. By writing this I’m hoping I feel a bit more confident with my upcoming drive replacement – one of the issues I have is only 4 storage drives, so increasing the total storage available means replacing all of the drives, one by one.

An aside: VirtualBox is frustrating as hell

While writing this blog post, I added and removed a fair few drives, as well as scrapping the VM configuration completely and starting again. For the benefit of future searchers, if you hit an error like:

UUID {c56e7d80-8abc-44db-a27c-669a8c98162c} of the medium 'F:\FreeBSD Test\rootfs.vdi' does not match the value {bad50d40-beeb-40a4-9e03-3a98bb85fe5e} stored in the media registry ('C:\Users\live\.VirtualBox\VirtualBox.xml').

The “bad” UUID wasn’t actually stored in the path shown, but in the .vbox file – the definition of the VM. Close VirtualBox’s interface, update the UUID and away we go again. Of course, if you’ve got important data in your VM, you may want to clone and reattach instead of editing files or modifying the drive UUID itself.

Virtual Machine Setup

I’m using VirtualBox 5.1.8 on a Windows 10 host. Newer versions should be fine, and the host OS shouldn’t matter.
To install FreeBSD, I’m using FreeBSD-11.0-RELEASE-amd64-disc1.iso, although most other recent releases should work.

Create a VM that looks something like this, but don’t add any disks yet

To speed things up, I’m creating disks using the vbox-img command instead of via the GUI

Set up the disks so they look like this in VirtualBox, with hotplugging enabled for every drive except the root FS.
The only other thing I needed to change was adding an IDE controller as VirtualBox didn’t want to boot from a SATA attached CD drive for unknown reasons

For posterity, here’s the other VM settings I used, although I’m not sure if they’re needed.

FreeBSD Setup

I won’t go through installing FreeBSD here, except for the following:
* Install the system to the first drive shown, ada0. The defaults should be fine
* Install an SSH server
* You may want to change to Bridged Networking in VirtualBox – I couldn’t SSH in using the VirtualBox network
* Log in as root on first boot and add a user through the console as FreeBSD sets PermitRootLogin no for sshd by default
* I install sudo and nano to make my life easier

At this point you should be able to sudo su to root, and see the following setup

Let’s create a 4GB file of random data to represent the important data installed on our NAS – the hardest part of this was working out what syntax to use for arithmetic. This is csh, we’re not in ~~Kansas~~ bash any more.

In a real server, it’d take a lot longer to “resilver” and replace the disk in the array. In this configuration, if you lose a second disk while the resilver is happening, you lose everything.

Let’s try a slightly more realistic situation. If a drive has failed, you’d plug in a new one (disk5.vdi in my example) and replace old with new. I thought FreeBSD would assign a new device name but it took the old ada4, leading to this slightly confusing command

I’d like to find out if there’s any other failure modes I should know about (could I “remove” the cache from my NAS by re-purposing it as the boot drive?) but at least I know I’m not going to lose my data if my system decides to reboot halfway through resilvering, and that removing and replacing a drive isn’t really that scary.