So after my descriptive post you might be wondering what’s so complex or time-requiring in running a tinderbox. That’s because I haven’t spoken about the actual manual labor that goes into handling the tinderbox.

The major work is of course scouring the logs to make sure that I file only valid bugs (and often enough that’s not enough, as things hide behind the surface), but there are a quite a number of tasks that are not related to the bug filing, at least not directly.

First of all, there is the matter of making sure that the packages are available for installation. This used to be more complex, but luckily thanks to REQUIRED_USE and USE deps, this task is slightly easier than before. The tinderbox.py script (that generates the list of visible packages that need to be tested) also generates a list of use conflicts, dependencies etc. This list I have to look at manually, and then update the package.use file so that they are satisfied. If their dependencies or REQUIRED_USE are not satisfied, the package is not visible, which means it won’t be tested.

This sounds extremely easy, but there are quite a few situations, which I discussed previously where there is no real way to satisfy requirements for all the packages in the tree. In particular there are situations where you can’t enable the same USE flag all over the tree — for instance if you do enable icu for libxml2, you can’t enable it for qt-webkit (well, you can but you have to disable gstreamer then, which is required by other packages). Handling all the conflicting requirements takes a bit of trial and error.

Then there is a much worse problem and that is with tests that can get stuck, so that things like this happen:

And I’ve got to keep growing the list of packages whose tests are unreliable — I wonder if the maintainers ever try running their tests, sometimes.

This task used to be easier because the tinderbox supports sending out tweets or dents through bti so that it would tell me what was its action — unfortunately identi.ca kept marking the tinderbox’s account as spam, and while they did unlock it three times, it meant I had to ask support to do so every other week. I grew tired of that and stopped caring about it. Unfortunately that means I have to connect to the instance(s) from time to time to make sure they are still crunching.

i received my native instruments komplete audio 6 in the mail today. i wasted no time plugging it in. i have a few first impressions:

build quality

this thing is heavy. not unduly so — just two or three times heaver than the audiofire 2 it replaces. it’s solidly built, so i imagine it can take a fair amount of beating on-the-go. knobs are sturdy, stiff rather than loose, without much wiggle. the big top volume knob is a little looser, with more wiggle, but it’s also made out of metal, rather than the tough plastic of the front trim knobs. the input ports grip 1/4″ jacks pretty tightly, so there’s no worry that cables will fall out.

i haven’t tested the main outputs yet, but the headphone output works correctly, offering more volume than my ears can take, and it seems to be very quiet — i couldn’t hear any background hiss even when turning up the gain.

JACK support

i have mixed first impressions here. according to ALSA upstream, and one of my buddies who’s done some kernel driver code for NI interfaces, it should work perfectly, as it’s class-compliant to the USB2.0 spec (no, really, there is a spec for 2.0, and the KA6 complies with it, separating it from the vast majority of interfaces that only comply with the common 1.1 spec).

i setup some slightly more aggressive settings on this USB interface than for my FireWire audiofire 2, which seems to have been discontinued in favor of echo’s new USB interface (though the audiofire 4 is still available, and is mostly the same). i went with 64 frames/period, 48000 sample rate, 3 periods/buffer . . . which got me 4ms latency. that’s just under half the 8ms+ latency i had with the firewire-based af2.

at these settings, qjackctl reported about 18-20% CPU usage, idling around 0.39-5.0% with no activity. i only have a 1.5ghz core2duo processor from 2007, so any time the CPU clocks down to 1.0ghz, i expect the utilization numbers to jump up. switching from the ondemand to performance governor helps a bit, raising the processor speed all the way up.

playing a raw .wav file through mplayer’s JACK output worked just fine. next, i started ardour 3, and that’s where the troubles began. ardour has shown a distressing tendency to crash jackd and/or the interface, sometimes without any explanation in the logs. one second the ardour window is there, the next it’s gone.

i tried renoise next, and loaded up an old tracker project, from my creative one-a-day: day 316, beta decay. this piece isn’t too demanding: it’s sample-based, with a few audio channels, a send, and a few FX plugins on each track.

playing this song resulted in 20-32% CPU utilization, though at least renoise crashed less often than ardour. renoise feels noticeably more stable than the snapshot of ardour3 i built on july 9th.

i wasn’t very thrilled with how much work my machine was doing, since the CPU load was noticeably better with the af2. though this is to be expected; the CPU doesn’t have to do so much processing of the audio streams; the work is offloaded onto the firewire bus. with usb, all traffic goes through the CPU, so that takes more valuable DSP resources.

still, time to up the ante. i raised the sample rate to 96000, restarted JACK, and reloaded the renoise project. now i had 2ms latency…much lower than i ever ran with the af2. this low latency took more cycles to run, though: CPU utilization was between 20% and 36%, usually around 30-33%.

i haven’t yet tested the device on my main workstation, since that desktop computer is still dead. i’m planning to rebuild it, moving from an old AMD dualcore CPU to a recent Intel Ivy Bridge chip. that should free up enough resources to create complex projects while simultaneously playing back and recording high-quality audio.

first thoughts

i’m a bit concerned that for a $200 best-in-class USB2.0 class-compliant device, it’s not working as perfectly as i’d hoped. all 6/6 inputs and outputs present themselves correctly in the JACK window, but the KA6 doesn’t show up as a valid ALSA mixer device if i wanted to just listen to music through it, without running JACK.

i’m also concerned that the first few times i plug it in and start it, it’s mostly rock-solid, with no xruns (even at 4ms) appearing unless i run certain (buggy) applications. however, it’s xrun/crash-prone at a sample rate of 96000, forcing me to step down to 48000. i normally work at that latter rate anyway, but still…i should be able to get the higher quality rates. perhaps a few more reboots might fix this.

it could be one of the three USB ports on this laptop shares a bus with another high-traffic device, which means there could be bandwidth and/or IRQ conflicts. i’m also running kernel 3.5.3 (ck-sources), with alsa-lib 1.0.25, and there might have been driver fixes in the 3.6 kernel and alsa-lib 1.0.26. i’m also using JACK1, version 0.121.3, rather than the newer JACK2. after some upgrades, i’ll do some more testing.

early verdict: the KA6 should work perfectly on linux, but higher sample rates and lowest possible latency are still out of reach. sound quality is good, build quality is great. ALSA backend support is weak to nonexistent; i may have to do considerable triage and hacking to get it to work as a regular audio playback device.

Hello there everybody, today’s episode is dedicated to set up a tinderbox instance like mine which is building and installing every visible package in the tree, running its tests and so on.

So first step is to have a system where to run the tinderbox. A virtual system is much preferred, since the tinderbox can easily install very insecure code, although nothing prevents you from running it straight on the metal. My choice for this, after Tiziano pointed me in that direction, was to get LXC to handle this, as a chroot on steroids (the original implementation used chroot and was much less reliable).

Now there are a number of degrees you could be running the tinderbox at; most of the basics are designed to work with almost every package in the system broken — there are only a few packages that are needed for this system to work, here’s my world file on the two tinderboxes:

But let’s do stuff in order. What do I do when I run the tinderbox? I connect on SSH over IPv6 – the tinderbox has very limited Internet connectivity, as everything is proxied by a Squid instance, like I described in this two years old post – directly as root unfortunately (but only with key auth). Then I either start or reconnect to a screen instance, which is where the tinderbox is running (or will be running).

The tinderbox’s scripts are on git and are written partially by me and partially by Zac (following my harassment for the most part, and he’s done a terrific job). The key script is tinderbox-continuous.sh which is simply going to keep executing, either ad-infinitum, or going through a file given as parameter, the tinderbox on 200 packages at a time (this way there is emerge --sync from time to time so that the tree doesn’t get stale). There is also a fetch-reverse-deps.sh which is used to, as the cover says, fetch the reverse dependencies of a given package, which pairs with the continuous script above when I do a targeted run.

On the configuration side, /etc/portage/make.conf has to refer to /root/flameeyes-tinderbox/tinderbox.make.conf which comes from the repository and sets up features, verbosity levels, and the fetch/resume commands to use curl.. these are also set up so that if there is a TINDERBOX_PROXY environment variable set, then it’ll go through it. Setting of TINDERBOX_PROXY and a couple more variables is done in /etc/portage/make.tinderbox.private.conf; you can use it for setting GENTOO_MIRRORS with something that is easily and quickly reachable, as there’s a lot to download!

But what does this get us? Just a bunch of files in /var/log/portage/build. How do I analyze them? Originally I did this by using grep within Emacs and looked at it file by file. Since I was opening the bugs with Firefox running on the same system, so I could very easily attach the logs. This is no longer possible, so that’s why I wrote a log collector which is also available and that is designed in two components: a script that receives (over IPv6 only, and within the virtual network of the host) the log being sent with netcat and tar, removes colour escape sequences, and writes it down as an HTML file (in a way that Chrome does not explode on) on Amazon’s S3, also counting how many of the observed warnings are found, and whether the build, or tests, failed — this data is saved over SimpleDB.

Then there is a simple sinatra-based interface that can be ran on any computer, and I run it locally on my laptop, and fetches the data from SimpleDB, and displays it in a table with links to the build logs. This also has a link to the pre-filled bug template (it uses a local file where emerge --info is saved as comment #0.

Okay so this is the general gist of it, if I have some more time this weekend I’ll draw some cute diagram for it, and you can all tell me that it’s overcomplicated and that if I did it in $whatever it would have been much easier, but at the same time you’ll not be providing any replacement, or if you will start working on it, you’ll spend months designing the schema of the database, with a target of next year, which will not be met. I’ve been there.

If you use the slock application, like I do, you may have noticed a subtle change with the latest release (which is version 1.1). That change is that the background colour is now teal-like when you start typing your password in order to disable slock, and get back to using your system. This change came from a dual-colour patch that was added to version 1.1.

I personally don’t like the change, and would rather have my screen simply stay black until the correct password is entered. Is it a huge deal? No, of course not. However, I think of it as just one additional piece of security via obscurity. In any case, I wanted it back to the way that it was pre-1.1. There are a couple ways to accomplish this goal. The first way is to build the package from source. If your distribution doesn’t come with a packaged version of slock, you can do this easily by downloading the slock-1.1 tarball, unpacking it, and modifying config.mk accordingly. The config.mk file looks like this:

but note that you do not need the extra set of escaping backslashes when you are using the colour name instead of hex representation.

If you use Gentoo, though, and you’re already building each package from source, how can you make this change yet still install the package through the system package manager (Portage)? Well, you could try to edit the file, tar it up, and place the modified tarball in the /usr/portage/distfiles/ directory. However, you will quickly find that issuing another emerge slock will result in that file getting overwritten, and you’re back to where you started. Instead, the package maintainer (Jeroen Roovers), was kind enough to add the ‘savedconfig’ USE flag to slock on 29 October 2012. In order to take advantage of this great USE flag, you firstly need to have Portage build slock with the USE flag enabled by putting it in /etc/portage/package.use:

echo "x11-misc/slock savedconfig" >> /etc/portage/package.use

Then, you are free to edit the saved config.mk which is located at /etc/portage/savedconfig/x11-misc/slock-1.1. After recompiling with the ‘savedconfig’ USE flag, and the modifications of your choice, slock should now exhibit the behaviour that you anticipated.

I guess it’s time for a new post on what’s the status with Gentoo Linux right now. First of all, the tinderbox is munching as I write. Things are going mostly smooth but there are still hiccups due to some developers not accepting its bug reports because of the way logs are linked (as in, not attached).

Like last time that I wrote about it, four months ago, this is targeting GCC 4.7, GLIBC 2.16 (which is coming out of masking next week!) and GnuTLS 3. Unfortunately, there are a few (biggish) problems with this situation, mostly related to the Boost problem I noted back in July.

What happens is this:

you can’t use any version of boost older than 1.48 with GCC 4.7 or later;

you can’t use any version of boost older than 1.50 with GLIBC 2.16;

many packages don’t build properly with boost 1.50 and later;

a handful of packages require boost 1.46;

boost 1.50-r2 and later (in Gentoo) no longer support eselect boost making most of the packages using boost not build at all.

This kind of screwup is a major setback, especially since Mike (understandably) won’t wait any more to unmask GLIBC 2.16 (he waited a month, the Boost maintainers had all the time to fix their act, which they didn’t — it’s now time somebody with common sense takes over). So the plan right now is for me and Tomáš to pick up the can of worms, and un-slot Boost, quite soon. This is going to solve enough problems that we’ll all be very happy about it, as most of the automated checks for Boost will then work out of the box. It’s also going to reduce the disk space being used by your install, although it might require you to rebuild some C++ packages, I’m sorry about that.

For what concerns GnuTLS, version 3.1.3 is going to hit unstable users at the same time as glibc-2.16, and hopefully the same will be true for stable when that happens. Unfortunately there are still a number of packages not fixed to work with gnutls, so if you see a package you use (with GnuTLS) in the tracker it’s time to jump on fixing it!

Speaking of GnuTLS, we’ve also had a smallish screwup this morning when libtasn1 version 3 also hit the tree unmasked — it wasn’t supposed to happen, and it’s now masked, as only GnuTLS 3 builds fine with it. Since upstream really doesn’t care about GnuTLS 2 at this point, I’m not interested in trying to get that to work nicely, and since I don’t see any urgency in pushing libtasn1 v3 as is, I’ll keep it masked until GNOME 3.6 (as gnome-keyring also does not build with that version, yet).

Markos has correctly noted that the QA team – i.e., me – is not maintaining the DevManual anymore. We made it now a separate project, under QA (but I’d rather say it’s shared under QA and Recruiters at the same time), and the GIT Repository is now writable by any developer. Of course if you play around that without knowing what you’re doing, on master, you’ll be terminated.

There’s also the need to convert the DevManual to something that makes sense. Right now it’s a bunch of files all called text.xml which makes editing a nightmare. I did start working on that two years ago but it’s tedious work and I don’t want to do it on my free time. I’d rather not have to do it while being paid for it really. If somebody feels like they can handle the conversion, I’d actually consider paying somebody to do that job. How much? I’d say around $50. Desirable format is something that doesn’t make a person feel like taking their eyes out when trying to edit it with Emacs (and vim, if you feel generous): my branch used DocBook 5, which I rather fancy, as I’ve used it for Autotools Mythbuster but RST or Sphinx would probably be okay as well, as long as no formatting is lost along the way. Update: Ben points out he already volunteered to convert it to RST, I’ll wait for that before saying anything more.

Also, we’re looking for a new maintainer for ICU (and I’m pressing Davide to take the spot) as things like the bump to 50 should have been handled more carefully. Especially now that it appears that it’s basically breaking a quarter of its dependencies when using GCC 4.7 — both the API and ABI of the library change entirely depending on whether you’re using GCC 4.6 or 4.7, as it’ll leverage C++11 support in the latter. I’m afraid this is just going to be the first of a series of libraries making this kind of changes and we’re all going to suffer through it.

email me "proof" you are running the latest stable -rc kernel at the moment.

send a link to some kernel patches you have done that were accepted into Linus's tree.

send a link to any Linux distro kernel tree where they keep their patches.

say why you want to do this type of thing, and what amount of time you can spend on it per week.

I'll close the application process in a week, on November 7, 2012, after that
I'll contact everyone who applied and do some follow-up questions through email
with them. I'll also post something here to say what the response was like.

In my previous post about Munin I said that I was still working on making sure that the async support would reach Gentoo in a way that actually worked. Now with version 2.0.7-r5 this is vastly possible, and it’s documented on the Wiki for you all to use.

Unfortunately, while testing it, I found out that one of the boxes I’m monitoring, the office’s firewall, was going crazy if I used the async spooled node, reporting fan speeds way too low (87 RPMs) or way too high (300K), and with similar effects on the temperatures as well. This also seems to have caused the fans to go out of control and run constantly at their 4KRPM instead of their usual 2KRPM. The kernel log showed that there was something going wrong with the i2c access, which is what the sensors program uses.

I started looking into the sensors_ plugin that comes with Munin, which I knew already a bit as I fixed it to match some of my systems before… and the problem is that for each box I was monitoring, it would have to execute sensors six times: twice for each graph (fan speed, temperature, voltages), one for config and one for fetching the data. And since there is no way to tell it to just fetch some of the data instead of all of it, it meant many transactions had to go over the i2c bus, all at the same time (when using munin async, the plugins are fetched in parallel). Understanding that the situation is next to unsolvable with that original code, and having one day “half off” at work, I decided to write a new plugin.

This time, instead of using the sensors program, I decided to just access /sys directly. This is quite faster and allows to pinpoint what data you need to fetch. In particular during the config step, there is no reason to fetch the actual value, which saves many i2c transactions even just there. While at it, I also made it a multigraph plugin, instead of the old wildcard one, so that you only need to call it once, and it’ll prepare, serially, all the available graphs: in addition to those that were supported before, which included power – as it’s exposed by the CPUs on Excelsior – I added a few that I haven’t been able to try but are documented by the hwmon sysfs interface, namely current and humidity.

The new plugin is available on the contrib repository – which I haven’t found a decent way to package yet – as sensors/hwmon and is still written in Perl. It’s definitely faster, has fewer dependencies and it’s definitely more reliable at leas ton my firewall. Unfortunately, there is one feature that is missing: sensors would sometimes report an explicit label for temperature data.. but that’s entirely handled in userland. Since we’re reading the data straight from the kernel, most of those labels are lost. For drivers that do expose those labels, such as coretemp, they are used, though.

Also we lose the ability to ignore the values from the get-go, like I describe before but you can’t always win. You’ll have to ignore the graph data from the master instead. Otherwise you might want to find a way to tell the kernel to not report that data. The same probably is true for the names, although unfortunately…

[temp*_label] Should only be created if the driver has hints about what this temperature channel is being used for, and user-space doesn’t. In all other cases, the label is provided by user-space.

But I wouldn’t be surprised if it was possible to change that a tinsy bit. Also, while it does forfeit some of the labeling that the sensors program do, I was able to make it nicer when anonymous data is present — it wasn’t so rare to have more than one temp1 value as it was the first temperature channel for each of the (multiple) controllers, such as the Super I/O, ACPI Thermal Zone, and video card. My plugin outputs the controller and the channel name, instead of just the channel name.

After I’ve completed and tested my hwmon plugin I moved on to re-rewrite the IPMI plugin. If you remember the saga I first rewrote the original ipmi_ wildcard plugin in freeipmi_, including support for the same wildcards as ipmisensor_, so that instead of using OpenIPMI (and gawk), it would use FreeIPMI (and awk). The reason was that FreeIPMI can cache SDR information automatically, whereas OpenIPMI does have support, but you have to tackle it manually. The new plugin was also designed to work for virtual nodes, akin to the various SNMP plugins, so that I could monitor some of the servers we have in production, where I can’t install Munin, or I can’t install FreeIPMI. I have replaced the original IPMI plugin, which I was never able to get working on any of my servers, with my version on Gentoo for Munin 2.0. I expect Munin 2.1 to come with the FreeIPMI-based plugin by default.

Unfortunately, like for the sensors_ plugin, my plugin was calling the command six times per host — although this allows you to filter for the type of sensors you want to receive data for. And that became even worse when you have to monitor foreign virtual nodes. How do I solve that? I decided to rewrite it to be multigraph as well… but shell script then was difficult to handle, which means that it’s now also written in Perl. The new freeipmi, non-wildcard, virtual node-capable plugin is available in the same repository and directory as hwmon. My network switch thanks me for that.

Of course unfortunately the async node still does not support multiple hosts, that’s something for later on. In the mean time though, it does spare me lots of grief and I’m happy I took the time working on these two plugins.

This problem seems to bite some of our hardened users a couple of times a year, so thought I’d blog about it. If you are using grsec and PulseAudio, you must not enable CONFIG_GRKERNSEC_SYSFS_RESTRICT in your kernel, else autodetection of your cards will fail.

PulseAudio’s module-udev-detect needs to access /sys to discover what cards are available on the system, and that kernel option disallows this for anyone but root.

A few days ago the box that was hosting our low-risk webapps died (barbet.gentoo.org). The services that were affected are get.gentoo.orgplanet.gentoo.orgpackages.gentoo.orgdevmanual.gentoo.orginfra-status.gentoo.org and bouncer.gentoo.org. We quickly migrated the services to another box (brambling.gentoo.org). Brambling had issues in the past with its RAM, but we changed them with new ones a couple of months ago. Additionally, this machine was used for testing only. Unfortunately the machine started to malfunction as soon as those services were transferred there, which means that it has more hardware issues than the RAM. The resulting error messages stopped when we disabled packages.gentoo.org temporarily. The truth is that this packages webapp is old, unmaintained, uses deprecated interfaces and real pain to debug. In this year’s GSoC we had a really nice replacement by Slava Bacherikov written in django. Additionally, recently we were given a Ganeti cluster hosted at OSUOSL. Thus we decided not to put up again the old packages.gentoo.org instance, and instead create 4 virtual machines in our Ganeti cluster, and migrate the above webapps there, along with the new and shiny packages.gentoo.org website. Furthermore, we will also deploy another GSoC webapp, gentoostats, and start providing our developers with virtual machines. We will not give public IPv4 IPs to the dev VMs though, but probably use IPv6 only so that developers can access them through woodpecker (the box where the developers have their shell accounts), but it is still under discussion. We already started working on the above, and we expect next week to be fully finished with the new webapps live and rocking. Special thanks to Christian and Alec who took care of the migrations before and during the Gentoo Miniconf.

A couple of days ago, Tomas and I, gave a presentation at the Gentoo Miniconf. The subject of the presentation was to give an overview of the current recruitment process, how are we performing compared to the previous years and what other ways there are for users to help us improve our beloved distribution. In this blog post I am gonna get into some details that I did not have the time to address during the presentation regarding our recruitment process.

Recruitment Statistics from 2008 to 2012

Looking at the previous graph, two things are obvious. First of all, every year the number of people who wanted to become developers is constantly decreased. Second, we have a significant number of people who did not manage to become developers. Let me express my personal thoughts on these two things.

For the first one, my opinion is that these numbers are directly related to the Gentoo’s reputation and its “infiltration” to power users. It is not a secret that Gentoo is not as popular as it used to be. Some people think this is because of the quality of our packages, or because of the frequency we cause headaches to our users. Other people think that the “I want to compile every bit of my linux box” trend belongs to the past and people want to spend less time maintaining/updating their boxes and more time doing some actual work nowadays. Either way, for the past few years we are loosing people, or to state it better, we are not “hiring” as many as we used to. Ignoring those who did not manage to become developers, we must admit that the absolute numbers are not in our favor. One may say that, 16 developers for 2011-2012 is not bad at all, but we aim for the best right? What bothers me the most is not the number of the people we recruit, but that this number is constantly falling for the last 5 years…

As for the second observation, we see that, every year, around 4-5 people give up and decide to not become developers after all. Why is that? The answer is obvious. Our long, painful, exhausting recruitment process drives people away. From my experience, it takes about 2 months from the time your mentor opens your bug, until a recruiter picks you up. This obviously kills someone’s motivation, makes him lose interest, get busy with other stuff and he eventually disappears. We tried to improve this process by creating a webapp two years ago, but it did not work out well. So we are now back to square one. We really can’t afford loosing developers because of our recruitment process. It is embarrassing to say at least.

Again, is there anything that can be done? Definitely yes. I’d say, we need an improved or a brand new web application that will focus on two things:

1) make the review process between mentor <-> recruit easier

2) make the final review process between recruit <-> recruiter an enjoyable learning process

Ideas are always welcomed. Volunteers and practical solutions even more ;) In the meantime, I am considering using Google+ hangouts for the face-to-face interview sessions with the upcoming recruits. This should bring some fresh air to this process ;)

In my previous post about Munin I said that I was still working on making sure that the async support would reach Gentoo in a way that actually worked. Now with version 2.0.7-r5 this is vastly possible, and it’s documented on the Wiki for you all to use.

Unfortunately, while testing it, I found out that one of the boxes I’m monitoring, the office’s firewall, was going crazy if I used the async spooled node, reporting fan speeds way too low (87 RPMs) or way too high (300K), and with similar effects on the temperatures as well. This also seems to have caused the fans to go out of control and run constantly at their 4KRPM instead of their usual 2KRPM. The kernel log showed that there was something going wrong with the i2c access, which is what the sensors program uses.

I started looking into the sensors_ plugin that comes with Munin, which I knew already a bit as I fixed it to match some of my systems before… and the problem is that for each box I was monitoring, it would have to execute sensors six times: twice for each graph (fan speed, temperature, voltages), one for config and one for fetching the data. And since there is no way to tell it to just fetch some of the data instead of all of it, it meant many transactions had to go over the i2c bus, all at the same time (when using munin async, the plugins are fetched in parallel). Understanding that the situation is next to unsolvable with that original code, and having one day “half off” at work, I decided to write a new plugin.

This time, instead of using the sensors program, I decided to just access /sys directly. This is quite faster and allows to pinpoint what data you need to fetch. In particular during the config step, there is no reason to fetch the actual value, which saves many i2c transactions even just there. While at it, I also made it a multigraph plugin, instead of the old wildcard one, so that you only need to call it once, and it’ll prepare, serially, all the available graphs: in addition to those that were supported before, which included power – as it’s exposed by the CPUs on Excelsior – I added a few that I haven’t been able to try but are documented by the hwmon sysfs interface, namely current and humidity.

The new plugin is available on the contrib repository – which I haven’t found a decent way to package yet – as sensors/hwmon and is still written in Perl. It’s definitely faster, has fewer dependencies and it’s definitely more reliable at leas ton my firewall. Unfortunately, there is one feature that is missing: sensors would sometimes report an explicit label for temperature data.. but that’s entirely handled in userland. Since we’re reading the data straight from the kernel, most of those labels are lost. For drivers that do expose those labels, such as coretemp, they are used, though.

Also we lose the ability to ignore the values from the get-go, like I describe before but you can’t always win. You’ll have to ignore the graph data from the master instead. Otherwise you might want to find a way to tell the kernel to not report that data. The same probably is true for the names, although unfortunately…

[temp*_label] Should only be created if the driver has hints about what this temperature channel is being used for, and user-space doesn’t. In all other cases, the label is provided by user-space.

But I wouldn’t be surprised if it was possible to change that a tinsy bit. Also, while it does forfeit some of the labeling that the sensors program do, I was able to make it nicer when anonymous data is present — it wasn’t so rare to have more than one temp1 value as it was the first temperature channel for each of the (multiple) controllers, such as the Super I/O, ACPI Thermal Zone, and video card. My plugin outputs the controller and the channel name, instead of just the channel name.

After I’ve completed and tested my hwmon plugin I moved on to re-rewrite the IPMI plugin. If you remember the saga I first rewrote the original ipmi_ wildcard plugin in freeipmi_, including support for the same wildcards as ipmisensor_, so that instead of using OpenIPMI (and gawk), it would use FreeIPMI (and awk). The reason was that FreeIPMI can cache SDR information automatically, whereas OpenIPMI does have support, but you have to tackle it manually. The new plugin was also designed to work for virtual nodes, akin to the various SNMP plugins, so that I could monitor some of the servers we have in production, where I can’t install Munin, or I can’t install FreeIPMI. I have replaced the original IPMI plugin, which I was never able to get working on any of my servers, with my version on Gentoo for Munin 2.0. I expect Munin 2.1 to come with the FreeIPMI-based plugin by default.

Unfortunately, like for the sensors_ plugin, my plugin was calling the command six times per host — although this allows you to filter for the type of sensors you want to receive data for. And that became even worse when you have to monitor foreign virtual nodes. How do I solve that? I decided to rewrite it to be multigraph as well… but shell script then was difficult to handle, which means that it’s now also written in Perl. The new freeipmi, non-wildcard, virtual node-capable plugin is available in the same repository and directory as hwmon. My network switch thanks me for that.

Of course unfortunately the async node still does not support multiple hosts, that’s something for later on. In the mean time though, it does spare me lots of grief and I’m happy I took the time working on these two plugins.

The Gentoo Miniconf is over now but it was a great success. There was 30+ developers that went and I met quite some users too. Thanks to Theo (tampakrap) and Michal (miska) for organizing the event (and others), thanks to openSUSE for sponsoring and letting the Gentoo Linux guys hangout there. Thanks to the other sponsors too, Google, Aeroaccess, et al.

For about a year now, I’ve been working at GRNET on its (OpenStack API compliant) open source IaaS cloud platform Synnefo, which powers the ~okeanos service.

Since ~okeanos is mainly aimed towards the Greek academic community (and thus has restrictions on who can use the service), we set up a ‘playground’ ‘bleeding-edge’ installation (okeanos.io) of Synnefo, where anyone can get a free trial account, experiment with the the Web UI, and have fun scripting with the kamaki API client. So, you get to try the latest features of Synnefo, while we get valuable feedback. Sounds like a fair deal.

Unfortunately, being the only one in our team that actually uses Gentoo Linux, up until recently Gentoo VMs were not available. So, a couple of days ago I decided it was about time to get a serious distro running on ~okeanos (the load of our servers had been ridiculously low after all ). For future reference, and in case anyone wants to upload their own image on okeanos.io or ~okeanos, I’ll briefly describe the steps I followed.

4) Chroot and install Gentoo in /mnt/gentoo. Just follow the handbook. At a minimum you’ll need to extract the base system and portage, and set up some basic configs, like networking. It’s up to you how much you want to customize the image. For the Linux Kernel, I just copied directly the Debian /boot/[vmlinuz|initrd|System.map] and /lib/modules/ of the VM (and it worked! ).

and make sure you have a sane grub.cfg (I’d suggest replacing all references to UUIDs in grub.cfg and /etc/fstab to /dev/vda[1]).
Now, outside the chroot, run:
grub-install --root-directory=/mnt --grub-mkdevicemap=/mnt/boot/grub/device.map /dev/loop0

snf-image-creator takes a diskdump as input, launches a helper VM, cleans up the diskdump / image (cleanup of sensitive data etc), and optionally uploads and registers our image with ~okeanos.

For more information on how snf-image-creator and Synnefo image registry works, visit the relevant docs [1][2][3].

0) Since snf-image-creator will use qemu/kvm to spawn a helper VM, and we’re inside a VM, let’s make sure that nested virtualization (OSDI ’10 Best Paper award btw ) works.

First, we need to make sure that kvm_[amd|intel] is modprobe’d on the host machine / hypervisor with the nested = 1 parameter, and that the vcpu, that qemu/kvm creates, thinks that it has ‘virtual’ virtualization extensions (that’s actually our responsibility, and it’s enabled on the okeanos.io servers).

If everything goes as planned, after snf-image-creator terminates, you should be able to see your newly uploaded image in https://pithos.okeanos.io, inside the Images container. You should also be able to choose your image to create a new VM (either via the Web UI, or using the kamaki client).

That’s all for now. Hopefully, I’ll return soon with another more detailed post on scripting with kamaki (vkoukis has a nice script using kamaki python lib to create from scratch a small MPI cluster on ~okeanos ).

If you’re running ~arch, you probably noticed by now that the latest OpenRC release no longer allows services to “need net” in their init scripts. This change has caused quite a bit of grief because some services no longer started after a reboot, or no longer start after a restart, including Apache. Edit: this only happens if you have corner case configurations such as an LXC guest. As William points out, the real change is simply that net.lo no longer provides the net virtual, but the other network interfaces do.

While it’s impossible to say that this is not annoying as hell, it could be much worse. Among other reasons, because it’s really trivial to work it around until the init scripts themselves are properly fixed. How? You just need to append to /etc/conf.d/$SERVICENAME the line rc_need="!net" — if the configuration file does not exist, simply create it.

Interestingly enough, knowing this workaround also allows you to do something even more useful, that is making sure that services requiring a given interface being up depend on that interface. Okay it’s a bit complex, let me backtrack a little.

Most of the server daemons that you have out there don’t really care of how many, which, and what name your interfaces are. They open either to the “catch-all” address (0.0.0.0 or :: depending on the version of the IP protocol — the latter can also be used as a catch-both IPv4 and IPv6, but that’s a different story altogether), to a particular IP address, or they can bind to the particular interface but that’s quite rare, and usually only has to do with the actual physical address, such as RADVD or DHCP.

Now to bind to a particular IP address, you really need to have the address assigned to the local computer or the binding will fail. So in these cases you have to stagger the service start until the network interface with that address is started. Unfortunately, it’s extremely hard to do so automatically: you’d have to parse the configuration file of the service (which is sometimes easy and most of the times not), and then you’d have to figure out which interface will come up with that address … which is not really possible for networks that get their addresses automatically.

So how do you solve this conundrum? There are two ways and both involve manual configuration, but so do defined-address listening sockets for daemons.

The first option is to keep the daemon listening on the catch-all addresses, then use iptables to set up filtering per-interface or per-address. This is quite easy to deal with, and quite safe as well. It also has the nice side effect that you only have one place to handle all the IP address specifications. If you ever had to restructure a network because the sysadmin before you used the wrong subnet mask, you know how big a difference that makes. I’ve found before that some people think that iptables also needs the interfaces to be up to work. This is not the case, fortunately, it’ll accept any interface names as long as they could possibly be valid, and then will only match them when the interface is actually coming up (that’s why it’s usually a better idea to whitelist rather than blacklist there).

The other option requires changing the configuration on the OpenRC side. As I shown above you can easily manipulate the dependencies of the init scripts without having to change those scripts at all. So if you’re running a DHCP server on the lan served by the interface named lan0 (named this way because a certain udev no longer allows you to swap the interface names with the permanent rules that were first introduced by it), and you want to make sure that one network interface is up before dhcp, you can simply add rc_need="net.lan0" to your /etc/conf.d/dhcpd. This way you can actually make sure that the services’ dependencies match what you expect — I use this to make sure that if I restart things like mysql, php-fpm is also restarted.

So after I gave you two ways to work around the current not-really-working-well status, but why did I not complain about the current situation? Well, the reason for which so many init scripts have that “need net” line is simply cargo-culting. And the big problem is that there is no real good definition of what “net” is supposed to be. I’ve seen used (and used it myself!) for at least the following notions:

there are enough modules loaded that you can open sockets; this is not really a situation that I’d like to find myself to have to work around; while it’s possible to build both ipv4 and ipv6 as modules, I doubt that most things would work at all that way;

there is at least one network interface present on the system; this usually is better achieved by making sure that net.lo is started instead; especially since in most cases for situations like this what you’re looking for is really whether 127.0.0.1 is usable;

there is an external interface connected; okay sure, so what are you doing with that interface? because I can assure you that you’ll find eth0 up … but no cable is connected, what about it now?

there is Internet connectivity available; this would make sense if it wasn’t for the not-insignificant detail that you can’t really know that from the init system; this would be like having a “need userpresence” that makes sure that the init script is started only after the webcam is turned on and the user face is identified.

While some of these particular notions have use cases, the fact that there is no clear identification of what that “need net” is supposed to be makes it extremely unreliable, and at this point, especially considering all the various options (oldnet, newnet, NetworkManager, connman, flimflam, LXC, vserver, …) it’s definitely a better idea to get rid of it and not consider it anymore. Unfortunately, this is leading us into a relative world of pain, but sometimes you have to get through it.

If you’re a Munin user in Gentoo and you look at ChangeLogs you probably noticed that yesterday I did commit quite a few changes to the latest ~arch ebuild of it. The main topic for these changes was async support, which unfortunately I think is still not ready yet, but let’s take a step back. Munin 2.0 brought one feature that was clamored for, and one that was simply extremely interesting: the former is the native SSH transport, the others is what is called “Asynchronous Nodes”.

On a classic node whenever you’re running the update, you actually have to connect to each monitored node (real or virtual), get the list of plugins, get the config of each plugin (which is not cached by the node), and then get the data for said plugin. For things that are easy to get because they only require you to get data out of a file, this is okay, but when you have to actually contact services that take time to respond, it’s a huge pain in the neck. This gets even worse when SNMP is involved, because then you have to actually make multiple requests (for multiple values) both to get the configuration, and to get the values.

To the mix you have to add that the default timeout on the node, for various reason, is 10 seconds which, as I wrote before makes it impossible to use the original IPMI plugin for most of the servers available out there (my plugin instead seem to work just fine, thanks to FreeIPMI). You can increase the timeout, even though this is not really documented to begin with (unfortunately like most of the things about Munin) but that does not help in many cases.

So here’s how the Asynchronous node should solve this issue: on a standard node, the requests to the single node are serialized so you’re actually waiting for each to complete before the next one is fetched, as I said, and since this can make the connection to the node take, all in all, a few minutes, and if the connection is severed then, you lose your data. The Asynchronous node, instead, has a different service polling the actual node on the same host, and saves the data in its spool file. The master in this case connects via SSH (it could theoretically work using xinetd but neither me nor Steve care about that), launches the asynchronous client, and then requests all the data that was fetched since the last request.

This has two side-effects: the first is that your foreign network connection is much faster (there is no waiting for the plugins to config and fetch the data), which in turn means that the overall munin-update transaction is faster, but also, if for whatever reason the connection fails at one point (a VPN connection crashes, a network cable is unplugged, …), the spooled data will cover the time that the network was unreachable as well, removing the “holes” in the monitoring that I’ve been seeing way too often lately. The second side effect is that you can actually spool data every five minutes, but only request it every, let’s say, 15, for hosts which does not require constant monitoring, even though you want to keep granularity.

Unfortunately, the async support is not as tested as it should be and there are quite a few things that are not ironed out yet, which is why the support for it in the ebuild has been this much in flux up to this point. Some things have been changed upstream as well: before, you had only one user, and that was used for both the SSH connections and for the plugins to fetch data — unfortunately one of the side effect of this is that you might have given your munin user more access (usually read-only, but often times there’s no way to ensure that’s the case!) to devices, configurations or things like that… and you definitely don’t want to allow direct access to said user. Now we have two users, munin and munin-async, and the latter needs to have an actual shell.

I tried toying with the idea of using the munin-async client as a shell, but the problem is that there are no ways to pass options to it that way so you can’t use --spoolfetch which makes it vastly useless. On the other hand, I was able to get the SSH support a bit more reliable without having to handle configuration files on the Gentoo side (so that it works for other distributions as well, I need that because I have a few CentOS servers at this point), including the ability to use this without requiring netcat on the other side of the SSH connection (using one old trick with OpenSSH). But this is not yet ready, it’ll have to wait for a little longer.

Anyway as usual you can expect updates to the Munin page on the Gentoo Wiki when the new code is fully deployed. The big problem I’m having right now is making sure I don’t screw up with the work’s monitors while I’m playing with improving and fixing Munin itself.

These XO-1.75 is based on the Marvell Armada 610 SoC (armv7l, non-NEON), which promises countless hours of fun enumerating and obtaining obscure pieces of software which are needed to make the laptop work.

One of these is the xf86-video-dove DDX for the Vivante(?) GPU: The most recent version 0.3.5 seems to be available only as SRPM in the OLPC rpmdropbox. Extracting it reveals a "source" tarball containing this:

At the Gentoo Miniconf 2012 in Prague we
will install Gentoo on the OLPC
XO-1.75, an ARM based laptop designed as an educational tool for
children. If you are interested in joining us, come to the Gentoo booth and
start hacking with us!

So the day started looking into getting a new version of ModSecurity into shape for a new stable ebuild in Gentoo for bug #438724 (a security issue in ModSecurity 2.6.8 and earlier). Unfortunately this also meant that I had to get a new CRS in, and that requires more testing than I was expecting.

The problem is that the ModSecurity 2.7 release is now stricter on what it accepts for rules. In particular now rules are mandated to have an unique ID. And that ID has to be only numeric. And that also means that if you publish your ruleset like I do you have to register for a reserved ID range with the ModSecurity developers. I did, and I have my proper range. I already developed a tool some time ago to validate my rules’ compliance with the new policy, but it turned out to requiring some tweaking anyway, as a few conditions weren’t reported properly.

Unfortunately the Core Rule Set (which is actually developed as a separate project by Ryan Barnett, whereas ModSecurity is maintained by Breno Silva), was not ready for this yet. Oh yes, the base rules, which are the only ones usually enabled by Gentoo, are fine, but the optional, experimental and the, newly introduced SpiderLabs Research rules are not ready. Some rules lack an ID, some IDs are duplicate, and some rules go well out of the designed ranges for them.

I pointed the guys at SpiderLabs/TrustWave at my script already — hopefully we’ll soon get a 2.2.7 release that covers those issues. Until then we’ll have to do with what we have. My rules are all fixed to work properly with the new ModSecurity though, this blog is using them already.

On a different note, I’ve considered making my validation of browsers’ user agents stronger than before, as spammers and exploit tools are becoming more advanced and more capable. In particular, I’ve found Mozilla’s docs as well as Microsoft’s which include a description for IE 8 and one for IE9 (I haven’t looked up one for IE 10 yet, I’m sure they have it). This should be enough to actually validate that there aren’t extraneous addons installed that could be signal for a spambot.

In particular, it seems like many of the posters in the recent wave of spam I’ve been hit with lately, which is looking exactly like a standard browser, reports coming from Firefox with WebMoney Advisor installed. Turns out that WebMoney is one of the many anonymous, electronic currencies that are so often used by spammers, carders, and the rest of the low-life scum that causes us so much grief as email users and bloggers. I wouldn’t be surprised if these were actually mechanical turks used to post spam bypassing various filters, who are then paid through that service.

Anyway, as usual please let me know if you can’t post the blog just send me an email, it shouldn’t happen but sometimes I have been overly excited with the rules themselves. On the other hand, I’ve tested most of the browsers as we have them lined up here at the office and they are fine — we don’t use or support Opera, but that should be fine as well. The infamous Opera Turbo issues should be fixed now, it would have been nice if Opera actually sent the proper HTTP parameters as required by the RFC when using that feature, but it’s okay.

As the quantity and quality of this year's entries will attest, Gentoo is
alive, well, and taking no prisoners!

We had 70 entries for the 2012 Gentoo screenshot contest, representing 11 different window managers / desktop environments. Thanks to all that participated, the judges and likewhoa for the screenshot site.

In the last few weeks, the conference team has worked hard to prepare the conference. The main news items you should be awere of are the FAQ which has been published, the party locations and times, the call to organize BoF sessions and of course the sponsors who help make the event possible. And we’re happy to tell you that we will provide live video streams from the main rooms during the event (!!!) and we announced the Round Table sessions during the Future Media track. Last but not least, there have been some interviews with intresting speakers in the schedule!

Talking about video interviews, there will be more videos in those channels: the openSUSE Video team is gearing up to tape the talks at the event. They will even provide a live stream of the event, which you can watch via flash and on a smartphone at bambuser and via these three links via ogv feeds: Room KirkRoom McCoy and Room Scotty. Keep an eye on the wiki page as the team will add feeds to more rooms if we can get some more volunteers to help us out.

Despite all our work, this event would be nothing without YOUR help. We’re still looking for volunteers to sign up but there’s another thing we need you for: be pro-active and get the most out of this event! That means not only sitting in the talks but also stepping up and participating in the BoF Sessions. And organize a BoF if you think there’s something to discuss!

Party time!

Of course, we’re also thinking about the social side of the event. Yes, there will surely be an extensive “hallway track” as we feature a nice area with booths and the university has lots of hallways… But sometimes it’s just nice to sit down with someone over a good beer, and this is where our parties come in. As this article explains, there will be two parties: one on Friday, as warming-up (and pre-registration) and one on Saturday, rockin’ in the city center of Prague. Note that you will need your badge to enter this party, which means you have to be registered!

Sponsors

As we wrote a few days ago, all this would not be possible without our sponsors, and we’d like to thank them A LOT for their support!

Big hugs to Platinum Sponsor SUSE, Gold Sponsor Aeroaccess, Silver Sponsor Google, Bronze Sponsor B1Systems, supporters ownCloud and Univention and of course our media partners LinuxMagazine and Root.cz. Last but not least, a big shout-out to the university which is providing this location to us!

FaQ

On a practical level, we also published our Conference FAQ answering a bunch of questions you might have about the event. If you weren’t sure about someting, check it out!

More

There will be more news in the coming days, be sure to keep an eye on news.opensuse.org for articles leading up and of course during the event. As one teaser, we’ve got the Speedy Geeko and Lightning talks schedule coming soon!

Be there!

Gentoo Miniconf, oSC12 and LinuxDays will take place at the Czech Technical University in Prague. The campus is located in the Dejvice district and is next to an underground station that gets you directly to the historic city center – an opportunity you can’t miss!

We expect to welcome about 700 Open Source developers, testers, usability experts, artists and professional attendees to the co-hosted conferences! We work together making one big, smashing event! Admission to the conference is completely free. However for oSC a professional attendee ticket is available that offers some additional benefits.

All the co-hosted conferences will start on October 20th. Gentoo Miniconf and Linuxdays end on October 21st, while the openSUSE Conference ends on October 23rd. See you there!

It’s been a while since I’ve done anything visible to anyone but myself. So, what the heck have I been doing?

Well, for starts, in the past year I’ve done a serious amount of work in Python. This work was one of the reasons for my lack of motivation for Gentoo. I went from doing little programming / maintenance at work to doing it 40+ hours a week. It meant I didn’t really feel up to doing more of it in my limited spare time. So I took up a few new hobbies. I got into Photography (feel free to look under links for the photo website). I feel weird with the self promotion for that type of thing, but, c’est la vie.

As the programming at work died down some, I started to find odd projects. I spent some serious time learning Go [1] and did a few small projects of my own in that. One of those projects will be open sourced soon. I know a fair few different languages, and I know C, Python, and Java pretty decently. While I like all of the ones on that list, I can’t say that I truly buy into the philosophies. Python is great. It’s simple, it’s clean, and it “just works.” However, I find that like OpenSSL, it gives you enough room to hang yourself and everyone else in the room. The lack of strict typing coupled with the fact that it’s a scripting language are downsides (in my eyes). C, for all that it is awesome at low level work, requires so much verbosity to accomplish the simplest tasks that I tend to shy away from it for anything other than what must be done at that level. Java… is well Java. It’s a decent enough language I suppose, but being run in a VM is silly in my eyes. It, like C, suffers from being too verbose as well (again, merely my humble opinion).

Enter Go. Go has duck typed interfaces, unlike Java’s explicit ones. It’s compiled and strictly typed. It has other modern niceties (like proper strings), along with a strong tie to web development (another area C struggles with). It has numerous interesting concepts (check out defer), along with what I find to be a MUCH better approach to error handling than what exists in any of C, Java, or Python. Add in that it is concurrent by design and you have one serious language. I must say that I am thoroughly impressed. Serious Kudos to those Google guys for one awesome language.

I also picked up a Nexus 7 and started looking into how Android is built and works. I got my own custom ROM and Kernel working along with a nice Gentoo image on the SD Card. Can anyone say “Go compiler on my Nexus 7?” This work also led me to do some work as far as getting Gentoo booting on Amazon’s Elastic Compute Cloud. Building Android takes for-freaking-ever, so I figured.. why not do it in the cloud!? It works splendidly, and it is fast.

So that covers new tricks. You mentioned goals and ideas?!

First, time to get myself off the slacker wagon and back to doing something useful. I no longer repulse at the idea of developing when I get home. That helps =p. One of the first things I want to spend some time addressing is disk encryption in Gentoo. I wrote here pertaining to the state of loop-aes. Both Loop-AES and Truecrypt need to spend a little time under the microscope as to how they should be handled within Gentoo. I’ll write more on his later when I have all my ducks in a row. I have no doubt that this will be a fun topic.

I also want to look into how a language like Go fits into Gentoo. Go has it’s own build system (no Makefiles, configure scripts, or anything else) that DOES have a notion of things like CFLAGS. It also has the ability to “go get” a package and install it. To those curious check out their website. All of these lead to interesting questions from a package management point of view. I am inclined to think that Go is around to stay. I hope it is. So we may as well start looking into this now rather than later. As my father used to tell me all the time, “Proper Prior Planning Prevents Piss Poor Performance.” Time to plan =).

That is, right after I sort out the fiasco that is my bug queue. *facepalm*

Everybody heard about the KISS principle I guess — the idea is the less complex a moving part is, the better. This is true in software as much as mechanics. Unix in particular, and all the Unix-like projects including GNU, also tended to follow that principle as it can be shown by the huge amount of small utilities that only do one particular text or file editing functions — that is until you introduce sed, awk and find.

Now we all know that the main sophistication that is afoot in the Linux world nowadays is Lennart’s systemd. I have no intention to discuss it now, or at any later time I’d say. I really don’t care as long as I have a choice not to use it, and judging from a given thread I think we’ll always have an alternative, no matter what some people said before and keep saying.

No, my problem today is not with udev deciding it’s time to stop using the same persistent rules that people had to fight with for years and that now are no longer usable, and instead it’s a problem with util-linux, and in particular with the losetup utility that manages the loop devices. See, the loop devices have been quite a big deal in the past, mostly because they started as a fixed amount, then the kernel let you decide how many, and then finally code was enabled that would let you change dynamically the amount of loop devices you want to have available. Great, but it required a newer version of util-linux, and at the time when it was introduced, there wasn’t one that actually worked as intended.

Anyway, in the past week I’ve been working on building a new firmware image for the device I’m working on, and when it comes down to run the script that generates the image to burn on the SSD, it locked up with 100% CPU usage (luckily the system is multicore so I could get in to kill it). The problem was to be found in losetup, so today with enough time on my hands, I went to check it out. Turns out that the reason why it failed was a joint issue between my setup, OpenRC updates, and util-linux updates, but let’s proceed with order.

The build happen on a container for which I was not mounting /sys — or at least so I intended, although it is possible that OpenRC mounted it on its own; this has changed recently, but I don’t think those changes hit stable yet, so I’m not sure that’s the case. I had created static nodes for the loop devices and for /dev/loop-control — but this latter was not to be found at first today. Maybe I deleted it by mistake or something along those lines. But the point is it worked before, and nothing changed beside an emerge -avuDN.

So, what happens is that the script is running something along the lines of losetup --find --show file which is intended to find the first available loop device, set up the file, and then print the loop device that was found. It’s a bit more complex than this as I’m explicitly setting up only the partition on the loop device (getting partitioned loop devices to play cool with LXC is a pain), but the point stands. Unfortunately, when both /dev/loop-control and /sys are unreachable, the looping around that should give us the first available device is looping over the same device over and over and over again, never trying the next. This causes the problem noted above, of losetup locking at 100% CPU usage.

And it’s definitely not the only problem! If you just execute losetup --find, which should give you the first available device, it provides you /dev/loop0even if that device is already in use. Not content enough with these problems? losetup -a lists no device, even when they are present, and still returns with a valid, zero exit status. Which is definitely not the case!

Okay you can say that losetup is already trying its best by using not one but three different sources (the third one is /proc/partitions) to find the data to use, but when the primary two are not usable, you shouldn’t expect it to give you proper information, should you? Well, that’s not the point. The big problem is that it should tell me “man, I can’t get you the data you requested because I need more sources, give me the sources!” instead of trying its best, failing, and locking up.

The next question is obviously “why are you ranting, instead of fixing it?” — the answer is that I tried, but the code I was reading made me cry. The problem is that nowadays, losetup is just a shallow interface to some shared code in util-linux .. and the design of said code makes it very difficult to make it clear whether a non-zero return value from a function is a “we reached the end of the list” or “I couldn’t see anything because I lack my sources”. And it really didn’t feel like a good idea for me to start throwing away that code to replace it with something more KISS-compliant.

So at the end of the day, I fixed my container to mount /sys and everything works, but util-linux is still broken upstream.

my main gentoo workstation is down. no more documentation updates from me for awhile.

it seems the desktop computer’s video card has finally bitten the dust. the monitor comes up as “no input detected” despite repeated reboots. so now i’m faced with a decision: throw in a cheap, low-end GFX card as a stopgap measure, or wash my hands of 3 to 6 years of progressive hardware failure, and do a complete rebuild. last time i put anything new in the box was probably back in 2009…said (dead) GFX card, and a side/downgraded AMD CPU. might be worth building an entirely new machine from scratch at this point.

i haven’t bothered to pay attention to the AMD-vs-Intel race for the last few years, so i’m a bit at a loss. i’ll check TechReport, SPCR, NewEgg, and all those sites, but…not being at all caught up on the bang-for-buck parts…is a bit disconcerting. i used to follow the latest trends and reviews like a true technoweenie.

and now, of course, i’m thinking in terms of what hardware lends itself to music production — USB/Firewire ports, bus latency, linux driver status for crucial bits; things like that. all very challenging to juggle after being out of it for so long.

Not that long ago we had our monthly Gentoo Hardened project meeting (on October 3rd to be exact). On these meetings, we discuss the progress of the project since the last meeting.

For our toolchain domain, Zorry reported that the PIE patchset is updated for GCC, fixing bug #436924. Blueness also mentioned that he will most likely create a separate subproject for the alternative hardened systems (such as mips and arm). This is mostly for management reasons (as the information is currently scattered throughout the Gentoo project at large).

For the kernel domain, since version 3.5.4-r2 (and higher), the kernexec and uderef settings (for grSecurity) should no longer impact performance on virtualized platforms (when hardware acceleration is used of course), something that has been bothering Intel-based systems for quite some time already. Also, the problem with guest systems immediately reserving (committing) all memory on the host should be fixed with recent kernels as well. Of course, this is only true as long as you don’t sanitize your memory, otherwise all memory gets allocated regardless.

In the SELinux subproject, we now have live ebuilds allowing users to pull in the latest policy changes directly from the git repository where we keep our policy at. Also, we will see a high commit frequency in the next few weeks (or perhaps even months) as Fedora’s changes are being merged with upstream. Another change is that our patchbundles no longer contain all individual patches, but a merged patch. This increases the deployment time of a SELinux policy package considerably (up to 30% faster since patching is now only a second or less). And finally, the latest userspace utilities are in the hardened-dev overlay ready for broader testing.

grSecurity is still focusing on the XATTR-based PaX flags. The eclass (pax-utils) has been updated, and we will now be looking at supporting the PaX extended attributes for file systems such as tmpfs.

For profiles, people will notice that in the next few weeks, we will be dropping the (extremely) old SELinux profiles as the current ones have been marked stable long time ago.

In the system integrity domain, IMA is being worked on (packages and documentation) after which we’ll move to the EVM support to protect extended attributes.

And finally, klondike held a good talk about Gentoo Hardened at the Flossk conference in Kosovo.

All in all a good month of work, again with many thanks to the volunteers that are keeping Gentoo Hardened alive and kicking!

Why this is needed

In testing linux bridging I noticed a problem that took me much longer then I feel comfortable admitting.
You cannot break out the VLANs to from a physical device and also use that physical device (attached to a bridge) to forward forward the entire trunk to a set of VMs.
The reason this occurs is that once linux starts inspecting for vlans on an interface to split them out it discards all those you do not have defined, so you have to trick it.

Setup

I had my Trunk on eth1. What you need to do is directly attach eth1 to a bridge (vmbr1). This bridge now has the entire trunk associated with it.
Here's the fun part, you can break out vlans on the bridge, so you would have an interface for vlan 13 named vmbr1.13 and then attach that to a brige, allowing you to have a group of machines only exposed to vlan 13.

The networking goes like this.

/->vmbr1.13 ->vmbr13->VM2eth1->vmbr1--->VM1\->vmbr1.42 ->vmbr42->VM3

Example

Here is the script I used with proxmox (you can set up the bridge in proxmox, but not the source for the bridges data (the 'input').
This is for VLANs 1-13 and assumes you have vyatta set up the target bridges. I had this start at boot (via rc.local).

The Keynote speaker for the Bootstrapping Awesome co-hosted conferences is going to be Agustin Benito Bethencourt. Agustin is currently working in Nuremberg, Germany as the openSUSE Team Lead at SUSE, and in the Free Software community he’s mostly known for his contributions to KDE and especially in the KDE eV. He is a very interesting guy, with a lot of experience about FOSS both from the community and the enterprise POV, which is also the reason I asked him to do the Keynote. I enjoy a lot working with him on organizing this conference, his experience is valuable. In this interview he talks a bit about himself, and a lot about the subject of his Keynote, the conference, openSUSE and SUSE, and about Free Software. The interview was done inside the SUSE office in Prague, with me being the “journalist” and Michal being the “camera-man”. Post-processing was done by Jos. More interviews from other speakers are about to come, so stay tuned! Enjoy!

This is a very brief post, which is just going to tell you that all the old content which was originally written on the b2evolution install at Planet Gentoo, and which was then imported into the WordPress install at Gentoo Blogs, have been imported into this site instead.

This basically means that Google will start indexing the old posts as well, which will then appear on the custom search’s results, which is why I went through the trouble. In particular yesterday’s post was actually linked to a very old problem which I enocuntered almost exactly seven years ago (time flies!) but finding it was a bit of a mess.

So I saved a backup of the Wordpress data, and wrote a script to create articles in Typo starting from it. The result is that now there are 324 extra posts in this blog — unfortunately some of them are noise. Some of it because the original images they linked to are gone, others because at some point b2evolution went crazy and started dropping anything where UTF-8 was involved, as can be shown by the original and the truncated version from the Wayback Machine. I’ll probably import the two articles I found being truncated.

After the import completed, I also deleted the WordPress site on Gentoo’s infra. The reason is simple: I don’t want to have two copies around, with all the broken links it has (nobody set up a remapping of the old planet links to blogs links), the truncated content and so on. There are a number of things that need to cleared up with the content, and links that went from this blog to the old one need to be fixed as well, but that’ll happen over time, I don’t want to sweat it now.

The recently approved EAPI 5 adds a feature called "slot-operator dependencies" to the package manager specification. Once these dependencies are implemented in the portage tree, the package manager will be able to automatically trigger package rebuilds when library ABI changes occur. Long-term, this will greatly reduce the need for revdep-rebuild.

If you are a Chromium user on Gentoo and you don't use portage-2.2, you have probably noticed that we are using the "preserve_old_lib" kludge so that your web browser doesn't break every time you upgrade the V8 Javascript library. This leaves old versions of V8 installed on your system until you manually clean them up. With slot-operator deps, we can eliminate this kludge since portage will have enough information to know it needs to rebuild chromium automatically. It's pretty neat.

I have forked the dev-lang/v8 and www-client/chromium ebuilds into my overlay to test this new feature; we can't really apply it in the main portage tree until a new enough version of portage has been stabilized. I will be maintaining the latest chromium dev channel release, plus a couple of versions of v8 in my overlay.

If you would like to try it out, you can install my overlay with layman -a floppym. Once you've upgraded to the versions in my overlay, upgrading/downgrading dev-lang/v8 should automatically trigger a chromium rebuild.

I originally posted the question on gentoo-hardened ML, but Sven Vermeulen advised me to file a bug, so there it is: bug #436474.

The problem I hit is that my ~/.config/chromium/ directory should have unconfined_u:object_r:chromium_xdg_config_t context, but it has unconfined_u:object_r:xdg_config_home_t instead.I could manually force the "right" context, but it turned out even removing the directory in question and allowing the browser to re-create it still results in wrong context. Looks like something deeper is broken (maybe just on my system), and fixing the root cause is always better. After all, other people may hit this problem too.Here is what error messages appear on chromium launch:

All common questions regarding travelling, transportation, event details, sightseeing and much more, in this Frequently Asked Questions page. Feel free to ask more questions, so we can include them in the FAQ and make it more complete

I was updating one of my boxens and ran into Bug 434686. In the bug Martin describes the simple way we as users can apply patches to packages that fail from bug fixes. This post is more than anything a reminder for me on how to do it. epatch_user has been blogged about before, dilfridge talks about it and says "A neat trick for testing patches in Gentoo (source-based distros are great!)".

As Martin explained in the bug and with the patch supplied by Liongene, here is how it works!

I've just updated the text on the Gentoo Wiki page on Ruby 1.9 to indicate that we now support eselecting ruby19 as the default ruby interpreter. This has not been tested extensively, so there may still be some problems with it. Please open bugs if you run into problems.

Most packages are now ready for ruby 1.9. If your favorite packages are not ready yet, please file a bug as well. We expect to make ruby 1.9 the default ruby interpreter in a few months time at the most. Your bug reports can help speed that up.

On a related note, we will be masking Ruby Enterprise Edition (ree18) shortly. With Ruby 1.9 now stable and well-supported we no longer see the need to also provide Ruby Enterprise Edition. This is also upstream's advice. On top of this the last few releases of ree18 never worked properly on Gentoo due to threading issues, and these are currenty already hard-masked.

Since we realize people may depend on ree18 and migration to ruby19 may not be straightforward, we intend to move slowly here. Expect a package mask within a month or so, and instead of the customary month we probably won't remove ree18 until after three months or so. That should give everyone plenty of time to migrate.

In portage-2.1.11.22 and 2.2.0_alpha133 there’s support for expermental EAPI 5-hdepend which adds the HDEPEND variable which is used to represent build-time host dependencies. For build-time target dependencies, use DEPEND (if the host is the target then both HDEPEND and DEPEND will be installed on it). There’s a special “targetroot” USE flag that will be automatically enabled for packages that are built for installation into a target ROOT, and will otherwise be automatically disabled. This flag may be used to control conditional dependencies, and ebuilds that use this flag need to add it to IUSE unless it happens to be included in the profile’s IUSE_IMPLICIT variable.

Second, I’d like to introduce a few enhancements I’ve made on these (some being merged upstream already).

Third, I’d like to turn this into a bit of a tutorial into getting started with EC2 as well since these scripts make it brain-dead simple.

I’ve previously written on building a Gentoo EC2 image from scratch, but those instructions do not work on EBS instances without adjustment, and they’re fairly manual. Edowd extended this work by porting to EBS and writing scripts to build a gentoo install from a stage3 on EC2. I’ve further extended this by adding a rudimentary plugin framework so that this can be used to bootstrap servers for various purposes – I’ve been inspired by some of the things I’ve seen done with Chef and while that tool doesn’t fit perfectly with the Gentoo design this is a step in that direction.

What follows is a step-by-step howto that assumes you’re reading this on Gentoo and little else, and ends up with you at a shell on your own server on EC2. Those familiar with EC2 can safely skim over the early parts until you get to the git clone step.

To get started, go to aws.amazon.com, and go through the steps of creating an account if you don’t already have one. You’ll need to specify payment details/etc. If you buy stuff from amazon just use your existing account (if you want), and there isn’t much more than enabling AWS.

Log into aws.amazon.com, and from the top right corner drop-down under either your name or My Account/Console choose “Security Credentials”.

Browse down to access credentials, click on the X.509 certificate tab, generate a certificate, and then download both the certificate and private key files. The web services require these to do just about anything on AWS.

On your gentoo system run as root emerge ec2-ami-tools ec2-api-tools. This installs the tools needed to script actions on EC2.

Export into your environment (likely via .bashrc) EC2_CERT and EC2_PRIVATE_KEY. These should contain the paths to the files you created in the previous step. Congratulations – any of the ac2-api-tools should now work.

We’re now going to checkout the scripts to build your server. Go to an empty directory and run git clone git://github.com/rich0/rich0-gentoo-bootstrap.git -b rich0-changes.

chdir to the repository directory if necessary, and within it run ./setup_build_gentoo.sh. This creates security zones and ssh keys automatically for you, and at the end outputs command lines that will build a 32 or 64 bit server. The default security zone will accept inbound connections to anywhere, but unless you’re worried about an ssh zero-day that really isn’t a big deal.

Run either command line that was generated by the setup script. The parameters tell the script what region to build the server in, what security zone to use, what ssh public key to use, and where to find the private key file for that public key (it created it for you in the current directory).

Go grab a cup of coffee – here is what is happening:

A spot request is created for a half decent server to be used to build your gentoo image. This is done to save money – amazon can kill your bootstrap server if they need it, and you’ll get the prevailing spot rate. You can tweak the price you’re willing to pay in the script – lower prices mean more waiting. Right now I set it pretty high for testing purposes.

The script waits for an instance to be created and boot. The build server right now uses an amazon image – not Gentoo-based. That could be easily tweaked – you don’t need anything in particular to bootstrap gentoo as long as it can extract a stage3 tarball.

A few build scripts are scp’ed to the server and run. The server formats an EBS partition for gentoo and mounts it.

A stage3 and portage snapshot are downloaded and extracted. Portage config files (world, make.conf, etc) are populated. A script is created inside the EBS volume, and executed via chroot.

That script basically does the typical handbook install (emerge sync, update world (which has all the essentials in it like dhcpcd and so on), build a kernel, configure rc files, etc.

The bootstrap server terminates, leaving behind the EBS volume containing the new gentoo image. A snapshot is created of this image and registered as an AMI.

A micro instance of the AMI is launched to test it. After successful testing it is terminated.

After the script is finished check the output to see that the server worked. If you want it outputs a command line to make the server public – otherwise only you can see/run it.

To run your server go to aws.amazon.com, sign in if necessary, browse to the EC2 dashboard. Click on AMIs on the left side, select your new gentoo AMI, and launch it (micro instances are cheap for testing purposes). Go to instances on the left side and hit refresh until your instance is running. Click on it and look down in the details for the public DNS entry.

To connect to your instance run ssh -i <path to pem file in your bootstrap directory> ec2-user@<public DNS name of your server>. You can sudo to root (no password).

That’s it – you have a server in the cloud. When you’re done be sure to clean up to avoid excessive charges (a few cents an hour can add up). Check the instances section and TERMINATE (not stop) any instances that are there. You will be billed by the month for storage so de-register AMIs you don’t need and go to the snapshot section and delete their corresponding snapshots.

Now, all that is useful, but you probably want to tailor your instance. You can of course do that interactively, but if you want to script it check out the plugins in the plugin directory. Just add a path to a plugin file at the end of the command line to build the instance and it will tailor your image accordingly. I plan to clean up the scripts a bit more to move anything discretionary into the plugins (you don’t NEED fcron or atop on a server).

The plugins/desktop plugin is a work in progress, but I think it should work now (takes the better part of a day to build). It only works 32-bit right now due to the profile line. However, if you run it you should be able to connect with x2goclient and have a KDE virtual desktop. A word of warning – a micro instance is a bit underpowered for this.

And on a side note, if somebody could close bugs 427722 and 423855 that would eliminate two hacks in my plugin. The stable NX doesn’t work with x2go (I don’t know if it works for anything else), and the stable gst-plugins-xvideo is missing a dependency. The latter bug will bite anybody who tries to install a clean stage3 and emerge kde-meta.

All of this is very much a work in progress. Patches or pull requests are welcome, and edowd is maintaining a nice set of up-to-date gentoo images for public use based on his scripts.

EAPI 5 includes support for automatic rebuilds via the slot-operator and sub-slots, which has potential to make @preserved-rebuild unnecessary (see Diego’s blog post regarding symbol collisions and bug #364425 for some examples of @preserved-rebuild shortcomings). Since this support for automatic rebuilds has potential to greatly improve the user-friendliness of preserve-libs, I have decided to make preserve-libs available in the 2.1 branch of portage (beginning with portage-2.1.11.20). It’s not enabled by default, so you’ll have to set FEATURES=”preserve-libs” in make.conf if you want to enable it. After EAPI 5 and automatic rebuilds have gained widespread adoption, I might consider enabling preserve-libs by default.

Sep 13th I stabilized net-analyzer/munin-2.0.5-r1 (security bug #412881). I use automated repoman checks and USE="-ipv6", and everything was fine at the time I committed the stabilization (also, see no mention of net-server in that security bug).

Sep 18th the repoman fix has been released in portage-2.1.11.18 and 2.2.0_alpha129.Now the only remaining thing to do is pushing the portage/repoman fix to stable. I especially like how quickly the fix for root cause (repoman check) has been produced and released.

There are thousands of guides out there on this subject, however I still struggled to set up an IPSEC VPN at first. This is a HOWTO for my own benefit – maybe someone else will use it too. I struggled because most of the guides involved setting up the VPN on a NAT’d host and connecting to the VPN inside the network. I didn’t do that on my linode, which has a static public IP.

My objectives were clear:

Create a connection point that was semi-secure while connecting to open wifi networks

Where 1.1.1.1 is your public eth0 address and 10.152.2.0 is the subnet that xl2tpd will assign IPs from (can be anything, I picked this at the advice of a guide because it is unlikely to be assigned from a router on a public network)

Remember that sysctl.conf is evaluated at boot so run sysctl -p to get the settings enabled now as well.

Step 5: Configure firewall (iptables):
This is the critical step that I wasn’t grokking from the existing guides in the wild. Even when bringing the firewall down to test, you need the NAT/forwarding rules:

Conclusion: The above examples should be enough to get the VPN working. There are some tweaking oppurtunities that I didn’t document or elaborate on. There is plenty of examples out there to look at or research, however. This was all setup without the firewall configuration and the client would connect but there would be no onward internet activity. It acted just like there was a invalid DNS server configured, at that point I looked into setting up a NAT, dnsmasq on the local interface, and other wierd things. In the end, just needed to forward the traffic properly.

As you probably have seen in the schedule, we have multiple room that have ugly names from university like 107, 155 or 349. We would like to rename them during the conference so people can remember them more easily. So try your creativity and send us some ideas!

Gentoo Miniconf: It will take place on Saturday and Sunday with a plethora of amazing talks by experienced Developers and Contributors, all around Gentoo, targeting both desktop and server environments!

On Saturday morning Fabian Groffen, Gentoo Council member, along with Robin H. Johnson, member of the Board of Trustees, will give us a quick view of how those two highest authorities manage the whole project. Afterwards there are going to be a few talks regarding various topics, like managing your home directory, the KDE team workflow, the important topic of Security and a benchmarking suite, all performed by important people for the project. A cool Catalyst workshop will be next, followed by a workshop regarding Gentoo Prefix, and at the end we’re going to participate on BoFs regarding the Infrastructure and the Gentoo PR, which will cover hot topics, like the Git migration and our website.

On Sunday we’ll see how a large company (IsoHunt) uses Gentoo, the tools it has developed and the problems it has encountered. Then, a cool talk about 3D games and graphic performance is going to take place, followed by a presentation on SHA1 and OpenPGP, which is the precursor of the Key Signing Party!! The second part of the Catalyst workshop is next, along with a Puppet workshop. At the end there are again two BoFs, the first about automated testing and the second about how we can grab more contributors and enlarge our cool project.

And a sneak peek on the other co-hosted conferences:

Future Media, which will be held on Saturday is a special feature track talking about the influence of developments in technology, social media and design on society. It will have talks like the future of Wikipedia and Open Data in general by Lydia Pintscher or using FOSS and open hardware for disaster relief by Shane Couglan.

The first day in the openSUSE Conference, Michael Meeks will tell you all aboutwhat’s new in LibreOffice, Klaas Freitag will give everyone a peek under the hood of ownCloud and for the more technical users, Stefan Seyfried will show you how to crash the Linux Kernel for fun and backtraces. Saturday night there’ll be a good party and the next day musician Sam Aaron will talk about Zen and how to Live Program music like he did during the party. Later, Libor Pecháček will explain the process of getting software from the community into commercial enterprises and at the end of the day Miguel Angel Barajas Watson will show us how a computer could win Jeopardy using SUSE, Power and Hadoop. The openSUSE event continues on Monday and Tuesday with many workshops and BoF sessions planned as well as a few large-room discussions about the future of the openSUSE development- and release process.

On Saturday the LinuxDays track features a number of Czech talks like an introduction to Gentoo by Tomáš Chvátal with his talk titled “if it moves, compile it!” (‘Pokud se to hýbe, zkompiluj to!’). Fedora is represented by Jiří Eischmann & Jaroslav Řezník later in the day. There also few real ninja-style talks about low-level programming like Petr Baudiš about low level programming and Thomas Renninger on modern CPU power usage monitoring (these both are in English). During the Saturday there will also be track of graphics workshops in Czech (Gimp, Inkscape, Scribus) followed by a 3D printing workshop (reprap!). Sunday is kicked of by Vojtěch Trefný explaining how to use Canonical’s Launchpad as a place to host your project (CZ). Those interested in networking will be taken care off by Pavel Šimerda (news from Linux Networking) and Radek Neužil who explains how to use networks securely (both CZ). You can also learn all about how to set up a Linux desktop/server solution for educational purposes (EN) and follow Vladimír Čunát talking about NixOS and the unique package manager this OS is build on. The LinuxDays track will be closed by Petr Krčmář (chief editor of root.cz) and Tomáš Matějíček (author of Slax) talking about future of Slax (CZ).

i just finished hacking on our XML for the month. several months ago, sven mentioned the changes needed to get the handbooks updated with initramfs/initrd instructions for separate /usr partitions. it took me a few hours, but i finally closed bug numbers 415175, 434550, 434554, and 434732. thanks to raúl for the patches.

i initially started putting in the patches as-is, but then i noticed that the initramfs descriptions were just copied from the x86+amd64 handbook. so, i stripped them out, and rewrote them as an included section common to all affected architecture handbooks. that <include> is then dynamically inserted by our XML processor, dropping the instructions into the appropriate place, so that there’s no extraneous text duplication.

that bit about include href="hb-install-initramfs.xml" fills in the next subsection with whatever we put in the hb-install-initramfs.xml include, which is never viewed by itself. little tricks like this make it much easier to maintain the documentation…we make one change to an include, and it’s propagated to all documents that use it. same goes for things like <keyval> — that variable is set elsewhere in our documentation, so that as kernel versions or ISO sizes change, we can update that value in one place (handbook-$ARCH.xml). every instance of the variable is automatically filled in when you view the handbook in your web browser.

not to say everything was smooth sailing while updating the handbooks…i ran into a few snags. i figured out why my initial commit attempts were blocked by our pre-commit hooks: it’s not that the xml interpreter was giving me spurious errors on each check. (“why you blocking me? i’m head of the project! DON’T YOU KNOW WHO I AM?!”) instead, i forgot a slash in a </body> element. THAT ruined the next 300 lines of code. solution: fix, re-run xmllint --valid --noout, add commit message, push to CVS.

the handbooks are now all set for the new initramfs/initrd mojo for those poor, poor souls mounting /usr on a separate partition/disk. my own partition layout is much simpler; i’ve never needed an initramfs.

I regularly use monit to monitor services and restart them if needed (and possible). An issue I’ve run into though with Gentoo is that openrc doesn’t act as I expect it to. openrc keeps it’s own record of the state of a service, and doesn’t look at the actual PID to see if it’s running or not. In this post, I’m talking about apache.

For context, it’s necessary to share what my monit configuration looks like for apache. It’s just a simple ‘start’ for startup and ‘stop’ command for shutdown:

When apache gets started, there are two things that happen on the system: openrc flags it as started, and apache creates a PID file.

The problem I run into is when apache dies for whatever reason, unexpectedly. Monit will notice that the PID doesn’t exist anymore, and try to restart it, using openrc. This is where things start to go wrong.

To illustrate what happens, I’ll duplicate the scenario by running the command myself. Here’s openrc starting it, me killing it manually, then openrc trying to start it back up using ‘start’.

You can see that ‘status’ properly returns that it has crashed, but when running ‘start’, it thinks otherwise. So, even though an openrc status check reports that it’s dead, when running ‘start’ it only checks it’s own internal status to determine it’s status.

This gets a little weirder in that if I run ‘stop’, the init script will recognize that the process is not running, and reset’s openrc’s status to stopped. That is actually a good thing, and so it makes running ‘stop’ a reliable command.

Resuming the same state as above, here’s what happens when I run ‘stop’:

# /etc/init.d/apache2 stop
* apache2 not running (no pid file)

Now if I run it again, it checks both the process and the openrc status, and gives a different message, the same one it would as if it was already stopped.

# /etc/init.d/apache2 stop
* WARNING: apache2 is already stopped

So, the problem this creates for me is that if a process has died, monit will not run the stop command, because it’s already dead, and there’s no reason to run it. It will run ‘start’, which will insist that it’s already running. Monit (depending on your configuration) will try a few more times, and then just give up completely, leaving your process completely dead.

The solution I’m using is that I will tell monit to run ‘restart’ as the start command, instead of ‘start’. The reason for this is because restart doesn’t care if it’s stopped or started, it will successfully get it started again.

I don’t know if my expecations of openrc are wrong or not, but it seems to me like it relies on it’s internal status in some cases instead of seeing if the actual process is running. Monit takes on that responsibility, of course, so it’s good to have multiple things working together, but I wish openrc was doing a bit more strict checking.

I don’t know how to fix it, either. openrc has arguments for displaying debug and verbose output. It will display messages on the first run, but not the second, so I don’t know where it’s calling stuff.

If you need to convert .mts files to .mov (so that e.g. iMovie can import them), I found ffmpeg to be the best tool for the task (I don't want to install and run "free format converters" that are usually Windows-only and come from untrusted sources). This post is inspired by iMovie and MTS blog post.

First I tried just changing the container:

for x in *.MTS; do ffmpeg -i ${x} -c copy ${x/.MTS/.mov}; done

But QuickTime could not play sound from those files because of AC-3 codec. Also, the quality of the video playback was very poor. The other command I tried was:

Now QuickTime was able to play the sound, but problems with video remained. iMovie was unable to import the resulting files anyway (silently: I got no error message, just nothing happened when trying to import).

I’d like to announce a new initiative within the mips arch team. We are now supporting an xfce4-based desktop system for the Lemote Yeeloong netbook. The images can be found on any gentoo mirrors, under gentoo/experimental/mips/desktop-loongson2f. The installation instructions can be found here. The yeeloong netbook is particularly interesting because it only uses “free” hardware, ie. hardware which doesn’t require any proprietary code. It is manufactured by Lemote in China, and distributed and promoted in the US by “Freedom Included“. It is how Richard Stallman does his computing.

I’m blogging because I thought it was important for Planet Gentoo to know that mips devices are currently being manufactured and used in netbooks as well as embedded systems. The gentoo mips team has risen to the challenge of targetting these systems and maintaining natively compiled stage4′s for them. Why stage4′s? And why a full desktop for the yeeloong? These processors are slow, so the time from a stage3 to a desktop is about three days for the yeeloong. Also, the yeeloong sports a little endian mips64 processor, the loongson2f, and we support three ABIs: o32, n32 and n64, with n32 being the preferred. This significantly increases the time to build glibc and other core packages. I provide two images, a vanilla one and a hardened one. The latter adds full hardening (pie, ssp, _FORTIFY_SOURCES=2, bind now, relro) to the toolchain and userland binaries as we do for amd64 and i686 in hardened gentoo. I have not ported over the hardened kernel, however.

I allude above to “other” targetted devices. I am also maintaining some mips uclibc systems (both hardened and vanilla) which are on the gentoo mirrors under experimental/mips/uclibc. But I will speak more of these later as part of an initiative to maintain hardened uclibc systems on “alternative” architectures such as arm, mips, ppc as well as amd64 and i686.

You can read the full installation instructions, but here’s a quick summary, since it doesn’t follow the usual Gentoo method of starting from a stage3:

Prepare either a pen drive or a tftp server with a rescue image: netboot-yeeloong.img

Turn on the yeeloong and hit the Del key multiple times until you get the firmware promt: PMON>

If netbooting, add an IP address and point to the netboot-yeeloong.img. If using a pen drive then point to thei image on the drive and boot into the rescue environment.

Partition and format the drive.

Download the desktop image from a mirror via http or ftp. Its about 350 MB in size.

Unpack the image. It contains not only the userland, but also a kernel.

Reboot to the PMON> prompt. Aim to the kernel on the drive. PMON will remember your choice and you will not have to repeat this step.

Once installed, you will log in as an ordinary user with sudo with username and password = “gentoo”. The root password is also set to “root”. It is an ordinary Gentoo system, so edit your make.conf, emerge –sync and add whatever packages you like! File bugs to: blueness@gentoo.org with a CC to mips@gentoo.org.

If you have a Yeeloong or go out and buy one, consider trying out this image.

This is another (second) post about updating a system I rarely updated. If you're interested, read the first post. I recommend more frequent updates, but I also want to show that it's possible to update without re-installing, and how to solve common problems.