Given that my first UNIX experience was on Linux, I’ve gotten used to the way certain commands work. When I switched from Linux to OpenIndiana (an Illumos-based distro), I had to get used to the fact that some commands worked slightly (or in some case extremely) differently. One such command is ping.

On Linux, invoking ping without any arguments, I would get the very familiar output:

I was very surprised when I first ran ping on an OpenIndiana box since it outputted something very different:

oi$ ping powerdns.com
powerdns.com is alive

No statistics! Just a boolean indicating “has the host responded to a single ping.” When I run ping, I want to see the statistics—that’s why I run ping to begin with. The manpage helpfully points out that I can get statistics by using the -s option:

For the past few years, I’ve just been getting used to adding -s. It was a little annoying, but it wasn’t the end of the world because I don’t use pingthat much and when I do, the two extra characters don’t matter.

Two weeks ago, I decided to do some hardware hacking. After a bit of reading up on embedded boards, I ended up buying a Raspberry Pi B+. It’s essentially a slightly smaller form factor version of the B, that has more GPIO pins and uses microSD cards instead of SD cards.

I hooked it up to the TV and played with Raspbian and RiscOS a little bit. As you may have guessed by now, that was not enough fun for me. I just had to boot a custom OS that talked over serial. :) This of course required some way to connect the Pi to something that can talk serial. But that’s a post for another day. :P This post is going to be about my impression of the Pi, as well as a cute little use I found for it over the past week.

Impressions

The Pi is a rather small board. The B+ is even smaller. A lot has been written about the technical side, so I won’t bother.

I was rather impressed with how much punch this little board packs. The hardest part about getting it going was putting it in the case (I got one of those kits because it was cheaper than buying everything separately). The built-in 4-port USB hub ended up quite useful. It allowed me to plug in both a keyboard and a mouse and have NOOBS installing Raspbian and RiscOS within minutes. A quick reboot later, I was at a shell prompt. That’s where the “new toy high” wore off a little. (I know I’ve talked about this with people before — it’s cool to be portable, but it’s also boring since the architecture becomes irrelevant.) I had a shell, and the most creative thing I could think of was to look at /proc/cpuinfo and /proc/meminfo.

I do have some thoughts about where the Pi B+ could have been better. The B version used an SD card. The B+ uses a microSD card. I consider this a bit of a regression. I have a bunch of older SD cards and an SD card reader that works well with SD cards. Sadly, this card reader (using a microSD adapter) fails to play nice with the SDXC modernization of SD that all microSD cards seem to use. I have the same issue with other microSD cards, so I’m pretty sure it’s the card reader. This makes updating a bit more of a pain.

The other thing I wish the Pi had is a DB9 RS232 connector. I have USB serial dongles that work well, but to talk serial to the Pi one needs to either get a level converter or a TTL serial to USB cable. I ended up getting a cheap USB cable with a fake Prolific chip inside. It works, but I hear Windows users are having a terrible time with evil drivers from Prolific.

Storm Timelapse

A little over a week after getting the Pi in the mail, we got a large storm heading our way. I got the brilliant idea to set up a webcam in an upstairs window. Previously, this would involve digging up an old computer, setting it up by the window, etc. This time, I reached for the Pi. I connected a webcam to one of the USB ports and a cheap WiFi USB adapter to another. A short config later, Raspbian was on the network even though there’s no network drop in sight.

I didn’t want to abuse the microSD card for storage of images, so I mounted an NFS share from the storage server in the basement. I had to use the nolock option to make the mount happen. I probably could have figured out why the lock manager was not running, but it was a temporary setup so a “quick hack” was all I did.

To capture images from the webcam, I ended up installing fswebcam, a small program that does one thing and does it well. I started up screen, and ran fswebcam with the following config.

Then, downstairs on my laptop, I mounted the same share and watched the files appear every five seconds. I ended up running the webcam for two days.

Here’s a couple of stills from the 27th:

And here’s a couple from the 28th:

I did make a quick timelapse, but I haven’t tried to figure out a reasonable set of codec options to not end up with 300 MB of video. Maybe one day I’ll find a good set of options and upload the video here. Here’s what I used:

In my previous post, I introduced Performance Co-Pilot (PCP). I know, I promised the next post to be about logging, but I thought I’d make a short detour and show how to install more PMDAs.

After installing PCP on a Linux system, you will have access to somewhere around 850 various metrics from the three basic PMDAs (pmcd, linux, and mmv). There are many more metrics that you can get at if you enable some of the non-default PMDAs.

I pondered what the best way to present a simple howto would be, and then I realized that simply copying & pasting a session where I install a PMDA will do.

In this post, I will use the PowerDNS PMDA as an example, but the steps are the same for the other PMDAs.

# cd pdns/
# ls
Install Remove pmdapdns.pl

As you can see, there are three files in this directory. We are interested in the Install script. Simply run it as root, and when it asks whether you want a collector, a monitor, or both answer appropriately — if you are running the daemon on the same host, answering both is your best bet. (I never had the need to answer anything else.)

At this point, the PMDA has been installed (take a look at /etc/pmcd/pmcd.conf to see the new config line there enabling the new PMDA). Now, we can see the new metrics using pminfo (there are many more, I just pruned the list for brevity):

During an experiment, I needed to install Fedora 12. I made a few mistakes:

I went with the netinstall. Unlike Debian’s netinstall, Fedora’s is very slow.

The installer was a bit sluggish under KVM, and so I accidentally clicked though the window that let me unselect Gnome. So it’s installing the whole shebang.

For whatever reason, it is installing CJK fonts. I do not speak either of those languages, and therefore they are useless to me. Furthermore, I’ve been told that something in the neighborhood of 20% of Fedora users make use of CJK. That just sounds wrong. Why install a package by default that only 20% of your userbase will benefit from? Aren’t there more useful packages?

As the name implies, they package up Hercules (an IBM mainframe emulator), and provide support for it. They are targetting the platform as a disaster recovery solution.

It shouldn’t directly affect the open source project in a negative way (just like Red Hat cannot prevent people from continuing their work on the Linux Kernel). At the same time, it’ll change the way people look at Hercules.

Extract the sources, and run the update-kernel script. I really mean this, if you try to be clever and apply the patch by hand, you’ll have a broken source tree. (The script runs patch to fixup some existing kernel files, and then it copies a whole bunch of other files into kernel tree.)

Configure, build, install, and reboot into the new kernel

You can modprobe perfctr and see spew in dmesg

That’s it for perfctr. Now PAPI itself…

Get & extract the source

./configure, make, make fulltest, make install-all

That’s it for PAPI. The make fulltest will run the tests. Chances are that they will all either pass or all fail. If they fail, then something is wrong (probably with perfctr). If they pass, then you are all set.

There are some examples in the src/examples directory. Those should get you started with using PAPI. It takes about 100 lines of C to get an arbitrary counter going.

Some other time, I’ll talk more about PAPI, and how I used it in my experiments.

While I am far from being a fan of ext4, I feel an obligation to set the record straight. But first, let me give you some references with an approximate timeline. I’m sure I managed to leave out a ton of details.

In mid-January, a bug titled Ext4 data loss showed up in the Ubuntu bug tracker. The complaining users apparently were using data on system crashes when using ext4. (The fact that Ubuntu likes to include every unstable & crappy driver into their kernels doesn’t help at all.) As part of the discussion, Ted Ts’o explained that the problem wasn’t with ext4 but with applications that did not ensure that the data they wrote was actually safe. The people did not like hearing that.

Things went pretty quiet until mid-March. That’s when a slashdot article made it painfully obvious that many of today’s apps are buggy. Some applications (KDE being a whole suite of applications) gotten used to the fact that ext3 was a very common filesystem used by Linux installations. More specifically, they got used to the behavior that ext3’s default mount option (data=ordered) provided. This is really the issue. The application developers assumed that the POSIX interface gave them more guarantees that it did! To make matters worse, the one way to ensure that the contents of a file get to the disk (the fsync system call) is very expensive on ext3. So over the past (almost) decade that ext3 has been around, application developers have been “trained” (think Pavlov reflexes) to not use fsync — on ext3, it’s expensive and the likelyhood of you losing data is much lower due to the default mount options. ext4’s fsync implementation, much like other filesystems’ implementations (e.g., XFS) does not suffer from this. (You may have heard about fsync on ext3 being expensive almost a year ago when Firefox was hit by this: Fsyncers and curveballs (the Firefox 3 fsync() problem). Note that in this case, as Ted Ts’o points out, the problem is that Firefox uses the same thread to draw the UI and do IO. That’s plain stupid.)

About the same time, Eric Sandeen wrote a blurb about the state of affairs: fsync, sigh. He points out that XFS has faced the same issue years ago. When the application developers were confronted about their application being broken, they just put fingers in their ears, hummed loudly, yelled “I can’t hear you!” There is a word for that, and here’s the OED definition for it:

denial,

The asserting (of anything) to be untrue or untenable; contradiction of a statement or allegation as untrue or invalid; also, the denying of the existence or reality of a thing.

The problem is application developers not wanting to believe that it’s an application problem. Well, it really is! Not only are those apps broken, but they are not portable. AIX, IRIX, or Solaris will not give you the same guarantees as ext3!

When April 1st came about, the linux-fsdevel mailing list got a patch from yours truly: [PATCH] fs: point out any processes using O_PONIES. (The pony thing…it’s a bit of an inside joke among the Linux filesystem developers.) The idea of having O_PONIES first came up in #linuxfs on OFTC. While I don’t remember who first thought of it (my guess would be Eric), I know for sure that it wasn’t me. At the same time, I couldn’t help it, and considering that the patch took only a minute to make (and compile test), it was well worth it.

Few days later, during the Linux Storage and Filesystem workshop, the whole fsync issue got some discussion time. (See “Rename, fsync, and ponies” at Linux Storage and Filesystem workshop, day 1.) The part that really amused me:

Prior to Ted Ts’o’s session on fsync() and rename(), some joker filled the room with coloring-book pages depicting ponies. These pages reflected the sentiment that Ted has often expressed: application developers are asking too much of the filesystem, so they might as well request a pony while they’re at it.

In the comments for that article you can find Ted Ts’o saying:

Actually, it was Josef ’Jeff’ Sipek who deserves the first mention of application programmers asking for pones, when he posted an April Fools patch submission for the new open flag, O_PONIES — unreasonable file system assumptions desired.

Another file system developer who had worked on two major filesystems (ext4 and XFS) had a t-shirt on that had O_PONIES written on the front. And the joker who distributed the colouring book pages with pictures of ponies was another file system developer working yet another next generation file system.

Application programmers, while they were questioning my competence, judgement, and even my paternity, didn’t quite believe me when I told them that I was the moderate on these issues, but it’s safe to say that most of the file system developers in the room were utterly unsympathetic to the idea that it was a good idea to encourage application programmers to avoid the use of fsync(). About the only one who was also a moderate in the room was Val Aurora (formerly Henson). Both of us recognize that ext3’s data=ordered mode was responsible for people deciding that fsync() was harmful, and I’ve said already that if we had known how badly it would encourage application writers to Do The Wrong Thing, I would have pushed hard not to make data=ordered the default. Unfortunately, memory wasn’t as plentiful in those days, and so the associated page writeback latencies wasn’t nearly as bad ten years ago.

Hrm, I’m not sure how to take it…he makes it sound like I’m an extremist. Jeff — a freedom fighter for sanity of filesystem interfaces! :) As I said, I can’t take credit for the idea of O_PONIES. As I was writing this entry, I mentioned it to Eric and he promptly wrote an entry of his own: Coming clean on O_PONIES. It looks like he isn’t sure that he was the one to invent it! I’ll give him credit for it anyway.

The next day, a group photo of the attendees was taken… You can clearly see Val Aurora wearing an O_PONIES shirt. The idea was Eric’s, and as far as I know, he had his shirt the first day.

Well, there you have it. That’s the summary of events with some of my thoughts interleaved. If you are writing a userspace application that does file IO, do the right thing, fsync the data you care about (or at least fdatasync).

There once was a lad from Braidwood With a wife and a hatred for FUD He hacked kernels for fun, couldn’t get them to run. But he always felt that he should.

See?

So when you say "there’s not enough poetry", next time you’ll know why. You *really* don’t want want poetry.

Then Alan Cox replied with modified lyrics to Eleanor Rigby:

Ah look at all the laundered pages Ah look at all the laundered pages

Handling Pages Pick up the list and the link where kswap has been A paging scheme Runs down the I/O Watching the queues that now keep me a list of the store Who is it for

All the laundered pages Where do they all come from All the laundered pages Where do they all belong

Meeting bdflush Writing the pages of a disk file that no one will clear No task comes near Look at it working Sleeping a lot in the night when there’s no pressure there What does it care

All the laundered pages Where do they all come from All the laundered pages Where do they all belong

Ah look at all the laundered pages Ah look at all the laundered pages

Oracle DB Died under load and was freed along with its name No admin came Good old bdflush Wiping the dirt from the pages as it walks down the chain Nothing was aged

All the laundered pages (Ah look at all the laundered pages) Where do they all come from All the laundered pages (Ah look at all the laundered pages) Where do they all belong

Then, there was an exchange of limerics between Rusty and Alan…

Rusty:

There once was a virtualization coder, Whose patches kept getting older, Each time upstream would drop, His documentation would slightly rot, SO APPLY MY FUCKING PATCHES OR I’LL KEEP WRITING LIMERICKS.

Alan:

There once was a man they called rusty Who patches were terribly crusty Though his patches were right And Linus was bright They sat on the list getting dusty.

Rusty:

There was a poetic infection Which distorted the kernel’s direction, The code got no time As they all tried to rhyme And it shipped needing lots of correction.

And finally, Alan:

Dear Rusty I think that we know Your code has good things to show But an unreliable guide To the poetic aside Would probably steal the show

As most of you know, virtuallization doesn’t really interest me, so me writing about lguest is rather unusual. For those who don’t know, lguest is Rusty Russell’s way of saying virtualization sucks and I can make it better (don’t quote me on that).

Yesterday, Rusty sent out 7 patch series ( 1, 2, 3, 4, 5, 6, 7) that contains most of the documentation for lguest. This is not the normal style of documentation you’ll find in the kernel. Here’s Rusty’s description…

Lguest is an adventure, with you, the reader, as Hero. I can’t think of many 5000-line projects which offer both such capability and glimpses of future potential; it is an exciting time to be delving into the source!

But be warned; this is an arduous journey of several hours or more! And as we know, all true Heroes are driven by a Noble Goal. Thus I offer a Beer (or equivalent) to anyone I meet who has completed this documentation.

So get comfortable and keep your wits about you (both quick and humorous). Along your way to the Noble Goal, you will also gain masterly insight into lguest, and hypervisors and x86 virtualization in general.

There is a very large number of totally hillarious comments. It looks like one doesn’t have to be an x86 expert to get a laugh out of them, but knowing a thing or two about the architecture makes it all the more enjoyable.

I can’t help but include few excerpts here…

Intel provided a special instruction to clear the TS bit for people too cool to use write_cr0() to do it. This "clts" instruction is faster, because all the vowels have been optimized out.

I’m told there are only two stories in the world worth telling: love and hate. So there used to be a love scene here like this:

Launcher: We could make beautiful I/O together, you and I. Guest: My, that’s a big disk!