Oracle Blog

Useful stuff for your blog-reading pleasure.

Thursday Jul 03, 2008

A lot of interesting things are going to happen in the next 30 years. One of them will be a big push to fix the so called "Year 2038 Problem" on Solaris and other Unix and C-based OSes (assuming there'll be any left), which will be similar to the "Year 2000 Problem".

The Year 2038 Problem

To understand the Year 2038 Problem, check out the definition of time_t in sys/time.h:

typedef long time_t; /\* time of day in seconds \*/

To represent a date/time combination, most Unix OSes store the number of seconds since January 1st, 1970, 00:00:00 (UTC) in such a time_t variable. On 32-Bit systems, "long" is a signed integer between -2147483648 and 2147483647 (see types.h). This covers the range between December 13th, 1901, 20:45:52 (UTC) and January 19th, 2038, 03:14:07, which the fathers of C and Unix thought to be sufficient back then in the seventies.

On 64-Bit systems, time_t can be much bigger (or smaller), covering a range of several hundred thousands of years, but if you're 32-Bit in 2038 you'll be in trouble: A second after January 19th, 2038, 03:14:07 you'll travel back in time and immediately find yourself in the middle of December 13th, 1901, 20:45:52 with a major headache called "overflow".

2038 could be today...

Well, you might say, I'll most probably be retired in 2038 anyway and of course, there won't be any 32-Bit systems that far in the future, so who cares?

A customer of mine cared. They run a very big file server infrastructure, based on Solaris, ZFS and a number of Sun Fire X4500 machines. A big infrastructure like this also has a large number of clients in many variations. And some of their clients have a huge problem with time: They create files with a date after 2040.

Now, the NFS standard will happily accept dates outside the 32-Bit time_t range and so will ZFS. But any program compiled in 32-Bit mode (and there are many) will run into an overflow error as soon as it wants to handle such a file. Incidentally, most of the Solaris file utilities (you know, rm, cp, find, etc.) are still shipped in 32-Bit, so having files 30+ years in the future is a big problem if you can't administer them.

The 64-Bit solution

One simple solution is to recompile your favourite file utilities, say, from GNU coreutils in 64-Bit mode, then put them into your path and hello future! You can do this by saying something like:

Now, while trying to reproduce the problem and sending some of my own files into the future, I found out thanks to Chris and his short "what happes if I try" DTrace script, that OpenSolaris already has a way to deal with these problems: ufs and ZFS just won't accept any dates outside the 32-Bit range any more (check out lines 2416-2428 in zfs_vnops.c). Tmpfs will, so at least I could test there on my OpenSolaris 2008.05 laptop.

That's one way to deal with it, but shutting the doors doesn't help our poor disoriented client of the future. And it's also only available in OpenSolaris, not Solaris 10 (yet).

The DTrace solution

So, I followed Ulrich's helpful suggestions and Chris' example and started to hack together a DTrace script of my own that would print out who is trying to assign a date outside of 32-Bit-time_t to what file, and another one that would fix those dates so files can still be accepted and dealt with the way sysadmins expect.

Of course, I ran "/opt/local/bin/touch -t 207106231200 /tmp/blah" in another terminal to trigger the probe in the script above (that was a 64-Bit touch compiled from GNU coreutils).

A couple of non-obvious hoops needed to be dealt with:

To make the script thread-safe, all variables need to be prepended with self->.

There are two system calls that can change a file's date: utimes(2) and futimesat(2) (let me know if you know of more). The former is very handy, because we can steal the filename from it's second argument, but the latter also allows to just give it a file descriptor. If we want to see the file names for futimesat(2) calls, then we may need to figure them out from the file descriptor. I decided to create my own fd-to-filename table by tapping into open(2) and close(2) calls because chasing down the thread data structures or calling pfiles from within a D script would have been more tedious/ugly.

Depending on whether we are fired from utimes(2) or futimesat(2), the arguments we're interested in change place, i.e. the filename will come from arg0 in the case of utimes(2) or arg1 if futimesat(2) was used. To always get the right argument, we can use something like probefunc=="utimes"?argo:arg1.

We can't directly access arguments to system calls and manipulate them, we have to use copyin().

I hope the comments inside the script are helpful. Be sure to check out the DTrace Documentation, which was very useful to me.

The second script is called correctbigtimes.d and it not only alerts us of files being sent into the future, it automatically corrects the dates to the current date/time in order to prevent any time-travel outside the bounds of 32-Bit time_t at all:

As you can see, we enabled DTrace's destructive mode (of course only for constructive purposes) which allows us to change the time values on the fly and ensure a stable time continuum.

This time, I left out the code that created the file descriptor-to-filename table, because this script may potentially be running for a long time and I didn't want to consume preciuous memory for just a convenience feature (Otherwise we'd kept an extra table of all open files for all running threads in the syste,!). If we get a filename string, we print it, otherwise a file descriptor needs to suffice, we can always look it up through pfiles(1).

The actual time modification takes place inside our local variables, which then get copied back into the system call through copyout().

I hope you liked this little excursion into the year 2038, which can happen sooner than we think for some. To me, this was a great opportunity to dig a little deeper into DTrace, a powerful tool that shows us what's going on while enabling us to fix stuff on the fly.

Update: Ulrich had some suggestions and found a bug, so I updated both scripts to version 1.2:

It's better to use two individual probes for utimes(2) and futimesat(2) than placing if-then structures based on probefunc. Improves readability, less error-prone, more efficient.

The predicates are now much simpler due to using struct timeval arrays there already.

Introduced constants for LONG_MIN and LONG_MAX to improve readability of the predicates.

The filename table doesn't account for one process opening a file in one thread, then pass the fd to another thread. Therefore, it's better to have a 2-dimensional array with fd and pid as index that is local.

The +8 in the predicate to fetch was incorrect, +16 or sizeof(struct timeval) would have been correct. That's fixed by using the original structures right from the beginning at predicate time.

Tuesday May 27, 2008

A couple of weeks ago, OpenSolaris 2008.05, project Indiana, saw its first official release. I've been looking forward to this moment so I can upgrade my home server and work laptop and start benefiting from the many cool features. If you're running a server at home, why not use the best server OS on the planet for it?

This is the first in a small series of articles about using OpenSolaris for home server use. I did a similar series some time ago and got a lot of good and encouraging feedback, so this is an update, or a remake, or home server 2.0, if you will.

USB disk advantages

This is the moment where people start giving me that "Yeah, right" or "Are you serious?" looks. But USB disk storage has some cool advantages:

It's cheap. About 90 Euros for half a TB of disk from a major brand. Can't complain about that.

It's hot-pluggable. What happens if your server breaks and you want to access your data? With USB it's as easy as unplug from broken server, plug into laptop and you're back in business. And there's no need to shut down or open your server if you just want to add a new disk or change disk configuration.

It scales. I have 7 disks running in my basement. All I needed to do to make them work with my server was to buy a cheap 15 EUR 4-port USB card to expand my existing 5 USB ports. I still have 3 PCI slots left, so I could add 12 disks more at full USB 2.0 speed if I wanted.

It's fast enough. I measure about 10MB/s in write performance with a typical USB disk. That's about as fast as you can get over a 100 MBit/s LAN network which most people use at home. As long as the network remains the bottleneck, USB disk performance is not the problem.

ZFS and USB: A Great Team

But this is not enough. The beauty of USB disk storage lies in its combination with ZFS. When adding some ZFS magic to the above, you also get:

Reliability. USB disks can be mirrored or used in a RAID-Z/Z2 configuration. Each disk may be unreliable (because they're cheap) individually, but thanks to ZFS' data integrity and self-healing properties, the data will be safe and FMA will issue a warning early enough so disks can be replaced before any real harm can happen.

Flexibility. Thanks to pooled storage, there's no need to wonder what disks to use for what and how. Just build up a single pool with the disks you have, then assign filesystems to individual users, jobs, applications, etc. on an as-needed basis.

Performance. Suppose you upgrade your home network to Gigabit Ethernet. No need to worry: The more disks you add to the pool, the better your performance will be. Even if the disks are cheap.

Together, USB disks and ZFS make a great team. Not enterprise class, but certainly an interesting option for a home server.

ZFS & USB Tips & Tricks

So here's a list of tips, tricks and hints you may want to consider when daring to use USB disks with OpenSolaris as a home server:

Mirroring vs. RAID-Z/Z2: RAID-Z (or its more reliable cousin RAID-Z2) is tempting: You get more space for less money. In fact, my earlier versions of zpools at home were a combination of RAID-Z'ed leftover slices with the goal to squeeze as much space as possible at some reliability level out of my mixed disk collection.But say you have a 3+1 RAID-Z and want to add some more space. Would you buy 4 disks at once? Isn't that a bit big, granularity-wise?That's why I decided to keep it simple and just mirror. USB disks are cheap enough, no need to be even more cheap. My current zpool has a pair of 1 TB USB disks and a pair of 512 GB USB disks and works fine.Another advantage of this aproach is that you can organically modernize your pool: Wait until one of your disks starts showing some flakyness (FMA and ZFS will warn you as soon as the first broken data block has been repaired). Then replace the disk with a bigger one, then its mirror with the same, bigger size. That will give you more space without the complexity of too many disks and keep them young enough to not be a serious threat to your data. Use the replaced disks for scratch space or less important tasks.

Instant replacement disk: A few weeks ago, one of my mirrored disks showed its first write error. It was a pair of 320GB disks, so I ordered a 512GB replacement (with the plan to order the second one later). But now, my mirror may be vulnerable: What if the second disk starts breaking before the replacement has arrived?That's why having a few old but functional disks around can be very valuable: In my case, took a 200GB and a 160GB disk and combined them into their own zpool:

zpool create temppool c11t0d0 c12t0d0

Then, I created a new ZVOL sitting on the new pool:

zfs create -sV 320g temppool/tempvol

Here's out temporary replacement disk! I then attached it to my vulnerable mirror:

zfs attach santiago c10t0d0 /dev/zvol/dsk/temppool/tempvol

And voilá, my precious production pool stated resilvering the new virtual disk. After the new disk arrived and has been resilvered, the temporary disk can be detached, destroyed and its space put to some other good use.Storage virtualization has never been so easy!

Don't forget to scrub: Especially with cheap USB disks, regular scrubbing is important. Scrubbing will check each and every block of your data on disk and make sure it's still valid. If not, it will repair it (since we're mirroring or using RAID-Z/Z2) and tell you what disk had a broken block so you can decide whether it needs to be replaced or not just yet.How often you want to or should scrub depends on how much you trust your hardware and how much your data is being read out anyway (any data that is read out is automatically checked, so that particular portion of the data is already "scrubbed" if you will). I find scrubbing once every two weeks a useful cycle, othery may prefer once a month or once a week.But scrubbing is a process that needs to be initiated by the administrator. It doesn't happen by itself, so it is important that you think of issuing the "zpool scrub" command regularly, or better, set up a cronjob for it to happen automatically.As an example, the following line:

in your crontab will start a scrub for each of your zpools twice a month on the 1st and the 15th at 01:23 AM.

Snapshot often: Snapshots are cheap, but they can save the world if you accientally deleted that important file. Same rule as with scrubbing: Do it. Often enough. Automatically. Tim Foster did a great job of implementing an automatic ZFS snapshot service, so why don't you just install it now and set up a few snapshot schemes for your favourite ZFS filesystems?The home directories on my home server are snapshotted once a month (and all snapshots are kept), once a week (keeping 52 snapshots) and once a day (keeping 31 snapshots). This gives me a time-machine with daily, weekly and monthly granularities depending on how far back in time I want to travel through my snapshots.

So, USB disks aren't bad. In fact, thanks to ZFS, USB disks can be very useful building blocks for your own little cost-effective but reliable and integrity-checked data center.

Let me know what experiences you made while using USB storage at home, or with ZFS and what tips and tricks you have found to work well for you. Just enter a comment below or send me email!

Friday May 16, 2008

If you understand german, are interested in virtualization and listen to podcasts, don't miss the current episode of the POFACS podcast.

POFACS, the podcast for alternative computer systems is a german podcast that coveres everything non-mainstream in computing. From people running their business on a Commodore 64 to the state of the art Amiga OS to office packages that fit on a floppy disk or one of the many Linux variants.

There have been a few episodes covering Solaris related technologies, such as ZFS and Project Indiana. Today adds an interview with my colleague Detlef from Berlin about virtualization.

Actually, whenever I listen to one of the POFACS episodes about some crazy new operating system that's being developed somewhere, I've always liked to try it out and see how it is. The perfect way to do that of course is to use virtualization, so you don't have to re-install your machine again. Well, that's where Sun's VirtualBox comes in: It comes with a great range of supported operating systems so there's a good chance it will run even the strangest alternative OS just fine.

Sunday Mar 30, 2008

In my last post, I compiled an installed the MediaTombUPnP server on Solaris in order to stream movies, photos and music to my PS3 and it worked well. But I wasn't quite satisfied with it's features: No support for tags/covers in AAC encoded music (>95% of my music library is encoded in the superior AAC format) and a few other quirks here and there. So I decided to try the TwonkyVisionTwonkyMedia server.

Unfortunately, the guys at TwonkyMedia (now PacketVideo) don't support their TwonkyVision server on Solaris (yet?). Only Linux, Windows and MacOS X are supported. The absence of answers to a Solaris request post in their forum isn't very encouraging. TwonkyMedia is closed source and only commercially available (EUR 29.95) which means you can't even compile it yourself on Solaris. At least there's a trial period of 30 days. Does this mean no ZFS and other Solaris goodness to TwonkyMedia?

Fear not, this is exactly what Branded Zones in Solaris 10/OpenSolaris are all about! They allow you to install a Linux distribution inside a Solaris 10 Container. The BrandZ framework then seamlessly translates Linux systemcalls into Solaris systemcalls. The result: All the goodness of Solaris, such as ZFS, FMA, DTrace and whatnot, even for closed source or otherwise problematic Linux applications. So, here's how to run the TwonkyMedia server on a Solaris x64/x86 machine (sorry, no SPARC, different CPU architecture):

Set up a standard lx branded Zone. Here's a short and sweet tutorial on how to do it. In my case, I used ZFS for the zone root path. This gives me compression and the ability to snapshot the Linux root filesystem whenever I like.

I used the CentOS tarball from the BrandZ download area to install a standard CentOS zone. Quick, easy, free, works well for most cases.

After having installed the CentOS Linux branded Zone and before the first boot, it is a good idea to make a ZFS snapshot of the root filesystem, just in case. You can later use the snapshot to revert the zone to it's freshly installed state or to easily clone more zones like this in the future.

After the first boot of the Linux zone with zoneadm -z zonename boot, you can login to it's virtual console using zlogin -z zonename. Now, setup basic networking from inside the Linux zone by editing the /etc/sysconfig/network file. Then, you can login through ssh -X into the Linux zone and run graphical configuration tools such as redhat-config-network to configure DNS, set up users, etc.

Now, download the TwonkyMedia server from the Linux zone by using wget http://www.twonkyvision.com/Download/4.4/twonkymedia-i386-glibc-2.2.5.zip and follow the TwonkyMedia installation guide.

You should now have the TwonkyMedia server up and running from within a Linux branded zone on Solaris! Connect to it through your webbrowser at http://your.servers.ip.address/:9000 and configure it's various settings to your taste.

This is it, actually it's much easier than compiling MediaTomb, but it comes at the cost of having to pay after the trial period, if you like it. Above, you see a picture of TwonkyMedia, running in an lx branded zone on Solaris, streaming AAC music from my favorite Chilean band "La Ley" to a PS3. Notice the cover art and song info to the bottom left that is not available with MediaTomb today for AAC encoded music.

Thursday Mar 20, 2008

Before visiting CeBIT, I went to see my friend Ingo who works at the Clausthal University's computing center (where I grew up, IT-wise). This is a nice pre-CeBIT tradition we keep over the years when we get to watch movies in Ingo's home cinema and play computer games all day for a weekend or so :).

To my surprise, Ingo got himself a new PlayStation 3 (40GB). The new version is a lot cheaper (EUR 370 or so), less noisy (new chip process, no PS2 compatibility), and since HD-DVD is now officially dead, it's arguably the best value for money in Blu-Ray players right now (regular firmware upgrades, good picture quality, digital audio and enough horsepower for smooth Java BD content). All very rational and objective arguments to justify buying a new game console :).

The PS3 is not just a Blu-Ray player, it is also a game console (I recommend "Ratchett&Clank: Tools of Destruction" and the immensely cute "LocoRoco: Cocoreccho!", which is a steal at only EUR 3) and can act as a media renderer for DLNA compliant media servers: Watch videos, photos and listen to music in HD on the PS 3 from your home server.

After checking out a number of DLNA server software packages, it seemed to me that MediaTomb is the most advanced open source one (TwonkyVision seems to be nicer, but sorry, it isn't open source...). So here is a step-by-step guide on how to compile and run it in a Solaris machine.

Basic assumptions

This guide assumes that you're using a recent version of Solaris. This should be at least Solaris 10 (it's free!), a current Solaris Express Developer Edition (it's free too, but more advanced) is recommended. My home server runs Solaris Express build 62, I'm waiting for a production-ready build of Project Indiana to upgrade to.

I'm also assuming that you are familiar with basic compilation and installation of open source products.

Whenever I compile and install a new software package from scratch, I use /opt/local as my base directory. Others may want to use /usr/local or some other directory (perhaps in their $HOME). Just make sure you use the right path in the --prefix=/your/favourite/install/path part of the ./configure command.

I'm also trying to be a good citizen and use the Sun Studio Compiler here where I can. It generally produces much faster code on both SPARC and x86 architectures vs. the ubiquitous gcc, so give it a try! Alas, sometimes the code was really stubborn and it wouldn't let me use Sun Studio so I had to use gcc. This was the path of least resistance, but with some tinkering, everything can be made to compile on Sun Studio. You can also try gcc4ss which combines a gcc frontend with the Sun Studio backend to get the best of both worlds.

Now, let's get started!

MediaTomb Prerequisites

Before compiling/installing the actual MediaTomb application, we need to install a few prerequisite packages. Don't worry, most of them are already present in Solaris, and the rest can be easily installed as pre-built binaries or easily compiled on your own. Check out the MediaTomb requirements documentation. Here is what MediaTomb wants:

sqlite3, libiconv and curl are available on BlastWave. BlastWave is a software repository for Solaris packages that has almost everything you need in terms of pre-built open source packages (but not MediaTomb...). Setting up BlastWave on your system is easy, just follow their guide. After that, installing the three packages above is as easy as:

MediaTomb uses a library called libmagic to identify file types. It took a little research until I found out that it is part of the file package that is shipped as part of many Linux distributions. Here I'm using file-4.23.tar.gz, which seems to be a reasonably new version. Fortunately, this is easy to compile and install:

MediaTomb also uses SpiderMonkey, which is the Mozilla JavaScript Engine. Initially, I had some fear about having to compile all that Mozilla code from scratch, but then it dawned on me that we can just use the JavaScript libraries that are part of the Solaris Firefox standard installation, even the headers are there as well!

That was it. Now we can start building the real thing...

Compiling and installing MediaTomb

Now that we have all prerequisites, we can move on to downloading, compiling and installing the MediaTomb package:

Somehow, the mediatomb developers want to enforce some funny LD_PRELOAD games which is uneccesary (at least on recent Solaris versions...). So let's throw that part of the code out: Edit src/main.cc and comment lines 128-141 out by adding /\* before line 128 and \*/ at the end of line 141.

Now we can configure the source to our needs. This is where all the prerequisite packages from above are configured in:

Check out the MediaTomb compile docs for details. One hurdle here was to use an extra iconv library because the MediaTomb source didn't work with the gcc built-in iconv library. Also, there were some issues with the Sun Studio compiler, so I admit I was lazy and just used gcc instead.

After these preparations, compiling and installing should work as expected:

gmakePATH=$PATH:/usr/ccs/bin:/usr/sfw/bin; export PATH; gmake install

Configuring MediaTomb

Ok, now we have successfully compiled and installed MediaTomb, but we're not done yet. The next step is to create a configuration file that works well. An initial config will be created automatically during the very first startup of MediaTomb. Since we compiled in some libraries from different places, we either need to set LD_LIBRARY_PATH during startup (i.e. in a wrapper script) or update the linker's path using crle(1).

In my case, I went for the first option. So, starting MediaTomb works like this:

Of course you should substitute your own interface. The port number is completely arbitrary, it should just be above 49152. Read the command line option docs to learn how they work.

You can now connect to MediaTomb's web interface and try out some stuff, but the important thing here is that we now have a basic config file in $HOME/.mediatomb/config.xml to work with. The MediaTomb config file docs should help you with this.

Here is what I added to my own config and why:

Set up an account for the web user interface with your own user id and password. It's not the most secure server, but better than nothing. Use something like this in the <ui> section:

Uncomment the <protocolInfo> tag because according to the docs, this is needed for better PS3 compatibility.

I saw a number of iconv errors, so I added the following to the config file in the import section. Apparently, MediaTomb can better handle exotic characters in file names (very common with music files) with the following tag:

<filesystem-charset>ISO-8859-1</filesystem-charset>

The libmagic library won't find its magic information because it's now in a nonstandard place. But we can add it with the following tag, again in the import section:

Actually, it should "just work" through libmagic, but it didn't for me, so adding theses mime types was the easiest option. It also improves performance through saving libmagic calls. Most digital cameras use the uppercase "JPG" extension and MediaTomb seems to be case-sensitive so adding the uppercase variant was necessary. It's also apparent that MediaTomb doesn't have much support for AAC (.m4a) even though it is the official successor to MP3 (more than 95% of my music is in AAC format, so this is quite annoying).

You can now either add <directory> tags to the <autoscan> tags for your media data in the config file, or add them through the web interface.

This is it. The pictures show MediaTomb running in my basement and showing some photos through the PS3 on the TV set. I hope that you can now work from here and find a configuration that works well for you. Check out the MediaTomb scripting guide for some powerful ways to create virtual directory structures of your media files.

MediaTomb is ok to help you show movies and pictures and the occasional song on the PS3 but it's not perfect yet. It lacks support for AAC (tags, cover art, etc.) and it could use some extra scripts for more comfortable browsing structures. But that's the point of open source: Now we can start adding more features to MediaTomb ourselves and bring it a few steps closer to usefulness.

Monday Mar 17, 2008

CeBIT 2008, the largest IT trade show worldwide, is over. This must be my 9th CeBIT as a Sunnie, boy does time fly fast. Here are a few impressions from my point of view.

Thanks to Detlef, who set up an Ultra 40 M2 with a current Solaris Express and Sun xVM Server for us (here's a nice writeup (sorry, in german) on how he did it, in case you want to try out xVM yourself), buildup was done really quickly. We had two monitors attached to the machine and thanks to NVIDIA's "nvidia-settings" tool that they ship with the Solaris NVIDIA drivers, setting up Twinview was a piece of cake too.

Then we set up the Compiz window manager to run on our Solaris Ultra 40 M2. Few people know what it is (it adds some 3D eye candy to your desktop, similar to Apple's) and even fewer know that it runs on Solaris as well. Thanks to Erwann, installing Compiz is just a matter of running a script. Even if you have an ATI card, you're likely to be able to run Compiz, thanks to Minskey's preliminary driver. It runs just fine on my Acer Ferrari 4000 laptop!

But then we found out that running many virtual OSes on a machine requires quite some amount of memory. Our 8 GB inside the Ultra 40 M2 wasn't enough for the different versions of Solaris, Linux and Windows that we had installed. So we hunted down an unsuspecting little Sun Blade X6220 module and ripped it open for an extra 4 GB. To the right, you see Ulrich performing the upgrade, Systemhero-like (i.e. no anti-static mats or straps, those are for sissies...). Now there was enough air to breathe for our virtualized OSes, the booth was ready to go!

Day 1 wasn't the busiest day, as expected, but it kept us quite entertained. Mario Heide from the german POFACS podcast stopped by and we explored a few things we could do for future episodes.

High-End Visualization: There was also quite an interest from the automotive industry in trying the Sun Fire X4600 M2 8-socket Opteron Server with up to 256 GB of RAM with the NVIDIA Quadro Plex VCS external graphics cards as a really big workstation, or a network visualization server. The LRZ supercomputer center near Munich is already using such as setup to provide virtualized remote graphics power to their researchers and now the manufacturing industry is starting to like the idea. An ideal companion for this is Sun's suite of visualization software that provides both scalable and shared approaches to high-end visualization. Try it out, it's free and open source.

Optimizing AMP: Another popular question was: "How can I optimize the AMP stack on Solaris and Sun Hardware?" Each day, I pointed about a dozen customers to our Cool Stack homepage, which is part of the Cool Tools developed by Sun for the UltraSPARC T1/T2 processors. The Cool Stack is simply a set of popular web apps (you know, Apache, MySQL, Perl, PHP, Tomcat and friends) which have been precompiled by Sun for Solaris on both x86 and SPARC architectures. Since we compile with Sun Studio compilers using the right options and integrate them with selected Solaris technologies, such as the cryptographic framework, using the Cool Stack is both easy to do and it provides great out-of-the-box performance.

All the other days were very busy. Loads of people, loads of questions lots of interest in Sun technologies, both in hardware and in software. The great thing about this particular CeBIT and the new Sun booth, now in Hall 2 was that the people who came by were all relevant to Sun. We hardly had any "bag-rats" at all, so I guess this is as good as it gets in terms of visitor quality. Visitors ranged from high-level IT executives through middle-management, system administrators, hackers, students and Sun/Solaris enthusiasts.

Sun Ray and Sun Secure Global Desktop: We also had schools looking at our Sun Ray and Sun Secure Global Desktop solutions as a flexible, secure, cost-effective and eco-friendly infrastructure for their schools. Actually, Sun Ray technologies were among the hottest topics discussed during this CeBIT at the Sun booth, not just for schools but also for any kind of environment that is sick and tired of having to upgrade Windows or Linux PCs every couple of years. Also call centers, branch offices and a couple of special applications such as kiosks are very good fits for Sun Rays.

Sun xVM was another hot topic. Having been at the Sun xVM pod with Ulrich and Detlef, we explained numerous times how the Sun xVM Server adds value to the work of the Xen community by providing Solaris technologies as the better foundation for virtual machines of all OSes. The Solaris Fault Manager can monitor your hardware and trigger virtual machine migration before the hardware starts failing for real, increasing uptime for your virtualized applications. This can work hand in hand with the Solaris Cluster, which adds high-availability features to virtualized OSes. ZFS is a great tool for providing fast, flexible, integrity-checked and powerful storage through iSCSI, NFS, CIFS or other protocols to virtualized environments. And there's much more, for example the Solaris Crossbow project which adds fully virtualized and bandwidth-managed network devices to the picture, enabling full network-in-a-box virtualization approaches. Oh, and when a virtual machine fails, you can debug it with DTrace, too. Levon has some nice examples about DTrace and Xen working together!

Sexy Hardware: No Sun booth at CeBIT without showing off some tin and this year was no exception. For starters, we had a datacenter with Sun's newest UltraSPARC T2, AMD and Intel based servers, both in rack-mount and in blade form factors. Of course we also had some storage arrays and a big tape library to show off.But the big eyecatcher was the Sun Modular Datacenter S20 (formerly known as "Project Black Box") which was so big and so eye-catching that we had to place it outside the halls, near the Intel pavillion. Our heroic product manager Ingo explained everything about project Black Box to customers, including more than a handful of TV stations. Even at 4 o'clock in the morning, for the ARD TV station's breakfast TV show...

Back to Solaris: The nice thing about Solaris at CeBIT 2008 was that we hardly needed to explain to people that it is free and open source. Most visitors already knew this and came to visit us specifically to learn some more about a particular Solaris feature, grab a Solaris Express Developer Edition DVD or ask questions about how to best deploy Solaris in their environment. One system administrator actually thanked us for producing our CSI:Munich ZFS video because it helped him gain his boss' support for deploying ZFS in their company. The boss just said: "If this really works, then we need to roll it out now!" (Of course it "really worked"). Actually, ZFS was one of the most popular discussion topics, and I logged in to my home machine more than once to show some real life, production snapshots, pools and other ZFS features on a living, breathing system.

Getting Started with Solaris: We handed out a lot of Solaris Express: Developer Edition DVDs and to get people going and avoid the initial humps of first-time Solaris users, we pointed visitors to the same essential and useful links over and over again. This inspired me to post an entry into the german Solarium blog with the 7 Most Useful Solaris and OpenSolaris links. Now I only need to point customers to a single website for all their initial Solaris needs: The Solarium.

Helping and Learning: But we learned a lot of new stuff, too. Not only are Ulrich and Detlef great sources of endless Solaris knowledge (them being OS Ambassadors at Sun), I also had a number of very illuminating conversations with customers and visitors. Thorsten Ludewig of the Wolfenbüttel University of Applied Sciences updated me on the state of the art of digital picture frames. A guy from Konstanz University pointed me to a small company in Switzerland called "PC Engines" that manufactures small form factor systems with good quality. I'm looking for a small, low-power system as a backup server at home and this might be it. He's running NetBSD on these systems for small and home server tasks, but I wonder if they work with Solaris as well. At only 256 MB it might be a stretch but not impossible. Other options I'm considering are VIA's Artigo kit or maybe a standard Via motherboard in an ITX case after all? Let me know if you have experience with Solaris on very small, very low-power machines.

Meeting Customers and Interests: CeBIT, like any major trade show is a great way to connect with customers and interests. Sometimes it's a way of meeting people you only knew virtually. In this case, we had three fans of the Systemhelden.com podcast HELDENFunk visit us at the booth: Graefin, Chaosblog and Unruheherd. All three came in white Sun T-Shirts which could only be rewarded with new black Systemhelden.com T-Shirts :). We had a great time during the Sun booth party that day and according to Chaosblog's latest entry, they seem to have had a fun time at CeBIT a well.

In closing, this was probably one of the best CeBITs I've ever had. Customers and partners like Sun, they are excited about our technology and they want more. Some know us because of our Software and were suprised to learn that we have hardware, too (this is a good sign), some come to see our hardware and discover our software portfolio (this case is slightly more common) and all want us to win, which is a good feeling :).

Wednesday Feb 27, 2008

A couple of weeks ago, the Sun Partner University saw 250 technical people from Sun's german partner community gathering in Fulda, Germany. Besides showing videos, talking about Sun Visualization Software, the Sun Grid Engine and Sun Studio Compilers and evangelizing Web 2.0, I had the honor of recording an interview with Alec Muffett, one of our Principal Engineers, based in the UK.

Monday Feb 25, 2008

CeBIT is the world's largest IT trade show. Whenever we mention this to our colleagues in the US, they say "sure". Only when they actually come over to our booth and experience the CeBIT feeling, they realize how really big it is. Most US trade shows use a really big exhibition hall. CeBIT has 21 (twenty-one) of them. Plus the space in between. Bring some good shoes.

I'll be at the Solaris part of the booth, talking to customers about Niagara 2 and other CPU and System Technologies, Solaris, OpenSolaris and ZFS, HPC and Grid Computing, Web 2.0 and what not. If you read this blog, stop by and say hi. Let me know what you like and what you don't like about this blog, about Sun or whatever else goes through your mind. I'll bring my voice recorder and a camera and we can talk about your own cool projects in a podcast interview that we can then publish through the HELDENFunk podcast. Join the System Heroes (or the german Systemhelden) and get a T-Shirt or I'll try to organize one of those champagne VIP passes for you. Just ask for me at the info counter.

Tuesday Feb 19, 2008

I've never installed Windows in my whole life. My computer history includes systems like the Dragon 32, the Commodore 128, then the Amiga, Apple PowerBook (68k and PPC) etc. plus the occasional Sun system at work. Even the laptop my company provided me with only runs Solaris Nevada, nothing else. Today, this has changed.

A while ago, Sun announced the acquisition of Innotek, the makers of the open-source virtualization software VirtualBox. After having played a bit with it for a while, I'm convinced that this is one of the coolest innovations I've seen in a long time. And I'm proud to see that this is another innovative german company that joins the Sun family, Welcome Innotek!

Here's why this is so cool.

After having upgraded my laptop to Nevada build 82, I had VirtualBox up and running in a matter of minutes. OpenSolaris Developer Preview 2 (Project Indiana) runs fine on VirtualBox, so does any recent Linux (I tried Ubuntu). But Windows just makes for a much cooler VirtualBox demo, so I did it:

After 36 years of Windows freedom, I ended up installing it on my laptop, albeit on top of VirtualBox. Safer XP if you will. To the top, you see my VirtualBox running Windows XP in all its Tele-Tubby-ish glory.

As you can see, this is a plain vanilla install, I just took the liberty of installing a virus scanner on top. Well, you never know...

So far, so good. Now let's do something others can't. First of all, this virtual machine uses a .vdi disk image to provide hard disk space to Windows XP. On my system, the disk image sits on top of a ZFS filesystem:

Cool thing #1: You can do snapshots. In fact I have two snapshots here. The first is from this morning, right after the Windows XP installer went through, the second has been created just now, after installing the virus scanner. Yes, there has been some time between the two snapshots, with lots of testing, day job and the occasional rollback. But hey, that's why snapshots exist in the first place.

The clone has inherited the mountpoint from the upper level ZFS filesystem (the winxp one) and so we have everything set up for VirtualBox to create a second Win XP instance from. I just renamed the new container file for clarity. But hey, what's this?

Damn! VirtualBox didn't fall for my sneaky little clone trick. Hmm, where is this UUID stored in the first place?

Ahh, it seems to be stored at byte 392, with varying degrees of byte and word-swapping. Some further research reveals that you better leave the first part of the UUID alone (I spare you the details...), instead, the last 6 bytes: 845c3a0e1c8d, sitting at byte 402-407 look like a great candidate for an arbitrary serial number. Let's try changing them (This is a hack for demo purposes only. Don't do this in production, please):

Who needs a hex editor if you have good old friends od and dd on board? The trick is in the "conv=notruc" part. It tells dd to leave the rest of the file as is and not truncate it after doing it's patching job. Let's see if it works:

Heureka, it works! Notice that the second instance is running with the freshly patched harddisk image as shown in the window above.

Windows XP booted without any problem from the ZFS-cloned disk image. There was just the occasional popup message from Windows saying that it found a new harddisk (well observed, buddy!).

Thanks to ZFS clones we can now create new virtual machine clones in just seconds without having to wait a long time for disk images to be copied. Great stuff. Now let's do what everybody should be doing to Windows once a virus scanner is installed: Install Firefox:

I must say that the performance of VirtualBox is stunning. It sure feels like the real thing, you just need to make sure to have enough memory in your real computer to support both OSes at once, otherwise you'll run into swapping hell...

BTW: You can also use ZFS volumes (called ZVOLs) to provide storage space to virtual machines. You can snapshot and clone them just like regular file systems, plus you can export them as iSCSI devices, giving you the flexibility of a SAN for all your virtualized storage needs. The reason I chose files over ZVOLs was just so I can swap pre-installed disk images with colleagues. On second thought, you can dump/restore ZVOL snapshots with zfs send/receive just as easily...

Watch the "USED" column for the winxp1 clone. That's right: Our second instance of Windows XP only cost us a meager 138 MB on top of the first instance's 1.22 GB! Both filesystems (and their .vdi containers with Windows XP installed) represent roughly a Gigabyte of storage each (the REFER column), but the actual physical space our clone consumes is just 138MB.

Cool thing #4: ZFS clones save even more space, big time!

How does this work? Well, when ZFS creates a snapshot, it only creates a new reference to the existing on-disk tree-like block structure, indicating where the entry point for the snapshot is. If the live filesystem changes, only the changed blocks need to be written to disk, the unchanged ones remain the same and are used for both the live filesystem and the snapshot.

A clone is a snapshot that has been marked writable. Again, only the changed (or new) blocks consume additional disk space (in this case Firefox and some WinXP temporary data), everything that is unchanged (in this case nearly all of the WinXP installation) is shared between the clone and the original filesystem. This is de-duplication done right: Don't create redundant data in the first place!

That was only one example of the tremenduous benefits Solaris can bring to the virtualization game. Imagine the power of ZFS, FMA, DTrace, Crossbow and whatnot for providing the best infrastructure possible to your virtualized guest operating systems, be they Windows, Linux, or Solaris. It works in the SPARC world (through LDOMs), and in the x86/x64 world through xVM server (based on the work of the Xen community) and now joined by VirtualBox. Oh, and it's free and open source, too.

So with all that: Happy virtualizing, everyone. Especially to everybody near Stuttgart.

Thursday Feb 14, 2008

If you read this blog regularly, you might have noticed that I like spending time participating in podcasts for the german website Systemhelden.com (For instance, see here, here and of course here). The podcast and the Systemhelden.com community is in german language, so if your native tongue isn't, the times of envy are over. Welcome to Systemheroes.co.uk!

What is it?

It's a community website for those that are the "up" in "uptime", the unsung heroes of data centers, the people that never get a "Thank you for delivering all of my 1526 emails today!" call: The system heroes. If you like tinkering with computer systems, it's probably something for you.

What's in it for me?

First of all: A lot of fun, including some comics. A place to plug your blog (and who doesn't want the occasional extra spike in hitrates...). A place to meet other system heroes and chat about those pesky little lusers and their latest PEBKAC incidents while exchanging LART maintenance tips. And they have the coolest system hero game around: Caffeine Crazy. As seen, er, heard on HELDENFunk #9 and #10. Try it out!

Yeah, there's some Sun marketing, too, I admit. Mainly references to cool technology from Sun and the ability to test it 60 days for free (if it's hardware) or just use it eternally for free (if it's software), but someone has to pay the hosting bills and I assure you: It's for the good of system herokind.

Oh, and you gotta love these great ads at the bottom of each page (my favourite is above).

Cool, what do I do?

Do as Yoda would say: "Hrrm, a system hero you want to be? Sign up you need!" Well, being a system hero has never been so much fun...