Strace is a tool that should be in a toolbox of every system administrator. Not only that it can help in troubleshooting simple problems (ie. missing libraries in newly created chroot, which ldd mysteriously misses to report) but it also helps in debugging very complex system problems and performance issues.

Recently I experienced a very strange problem with one of the RHEL 3 servers we’ve got. Problem manifested in a very strange way, SSH and su logins hanged, other daemons were also hanging during the startup, only way to reboot or shutdown the server was to physically press the restart/power off button, etc. All this could have been caused by problems on both software and hardware level. First suspicious was bad RAID controller, but after tests this proved to be a mislead. After more tests and brainstorms hardware problems were definitely excluded, so problem has to be on the software side. But what could be the problem?

After few more misleading steps I tried to trace system calls created by su command and found very interesting results.

And this is where the strace output ends and su command hangs. Audit device file is opened (file descriptor 3) and as soon as the first request is dispatched to this device (ioctl system call to file descriptor 3) command freezes. According to this I should just disable audit on the server and the problem will be gone.
As a test, audit daemon was temporarily stopped and I tried to switch to another user and the problem was indeed gone.

When the free space in the filesystem holding the audit logs is less than 20%, the above notify command will error out and auditd will enter suspend mode. This causes all system calls to block.

So this behavior is not a bug but actual feature of the software. :o) From security point of view this is expected behaviour – attacker could fill up filesystem where audit logs are stored before the attack and audit will be disabled, meaning no logs of his activity, so better not to allow ANY activity on the system if audit is not able to write to its logs. But still, this kind of behaviour renders the system completely useless to legitimate users.

The topic of this post is not audit, so I will stop here. Important thing is that strace led us directly to the main source of the problem. Resolution of issues like this would be much more complex and time consuming without this great little tool. :)

I was very bored today. Tired from working on Ratuus (don’t go there, site is under heavy construction :)) I needed something to help me take my mind off everything. And what better way to do it, than playing with Python, Pidgin and D-BUS. :D

To cut the long story short, I needed something that will update my Pidgin status message with the information about the current song I am listening. Till recently I was using Rhythmbox player and there is a perfect little Pidgin plugin called Current Track that worked with this player. Last week I discovered gmusicbrowser and fell in love immediately. It is fast, rich with functionalities but still simple to use. Exactly what I want from audio player. (Hm, I just noticed it is written in PERL. Now when Python is used for everything this comes as a big surprise.)

gmusicbrowser already has a plugin called NowPlaying. It will trigger some command whenever song is changed. I just needed to write this command that will inform Pidgin about the change. So, this seemed like a perfect exercise for slow Saturday. :)

Quick search on Pidgin and D-BUS showed extensive documentation about Pidgin API accessible through D-BUS. There is even a working example of how to change the status message! :)

But that was too simple, so I got another idea. Some time ago, I wrote a small daemon in C that will bind to a specific port and display random bofh-excuses fortune messages when someone would telnet to it. (Seems like I have a lot of spare time. I should really find some hobby!) Something similar to telnet bofh.jeffballard.us 666 (here for more information). So I was thinking about implementing the same for my Pidgin status. Random BOFH excuses in your status message! How cool geeky is that!

The result of all that is short (~60 lines of code) Python script that will set your Pidgin status message to:

a) you current song

pidgin_status.py -m The Real McKenzies – Outta Scotch

b) random line from a file

pidgin_status.py -f /usr/local/share/bofh-example

c) anything you give as the command line argument

pidgin_status.py Some very interesting and funny status message

Only difference between a) and c) is the type of the icon that will be shown. In example a) there will be a small musical note, while in example b) and c) nice arrow pointing to right side will be show.

In the middle of testing I noticed this strange message:

Being from Serbia myself, I find this extremely funny. Although, I didn’t know Serbian hackers are so notorious! :)

I hope someone will find it useful. In any case, I am accepting donations for some long and adventurous vacation. As you can see, I really need it! :D

I just read a very interesting presentation done by Aaron Bannert for ApacheCon 2005. Presentation is on “Building Scalable Web Architectures” and it is a very good reading for anyone interested in high scale web environments. Here is the link to the presentation.

10$ question. Sometime ago you have created backup of your systems Master Boot Record (MBR). Now, after some change, you noticed you did a fatal mistake and your partition table is corrupted and you need to recover it from the backup you created, but you are not sure if it is the correct version. The question is, what is the easiest way to read partition table from the backup of your MBR? No, Hex editor is not the easiest way to do it (and it is bad for your eyes :)). I wonder how many of you said ‘file’ command? Yes, magical file command is able to read the data from the mbr dump and prints you the actual partition table. Here is an example from my laptop.

As you can see I have only 3 partitions on the disk. First one has type 0×83, which is HEX id for ext3 type of partition and it is my / partition (you don’t see it here, but I know it :)). It is also active partition, it means that it is used for booting the system. You can also see the size of the partition in sectors. Knowing that one sector has length of 512 bytes you can easily find out the size of the partition.

Next partition is 0×82 which is swap partition. And last partition is 0x8e which is id for Linux LVM partition.

While I am here, I could also explain what is MBR and how it is used.

Main Boot Record resides in first 512 bytes of your bootable disk. Besides partition table it also holds bootloader and something called a magic number. As you can see on the picture, bootloader takes the biggest part of MBR, whole 446 bytes. During the boot process BIOS search for a bootable devices attached to your system and once it finds it it looks at the MBR and loads the bootloader, also called primary bootloader. Primary bootloader looks at the partition table inside MBR (next 64 bytes after the bootloader) and searches for an active partition. When it finds active partition it loads the secondary boot loader from that partitions boot record which, in turn, loads the kernel, and so on.

Magic number is used for sanity check of your MBR. It holds only 2 bytes and should be 0xAA55.

So, in short words, MBR is used to easily locate and load kernel from the correct device. (It is also used by your operating system to find the layout of the disk, but that is another story.)

PS: You can create a dump of your MBR by issuing next command:

# dd if=/dev/sda of=mbr.bin bs=512 count=1

Replace /dev/sda with the correct address to your disk.

PS 2: Sorry for bad quality of the MBR scheme, but I didn’t have much time to work on it and I am not a graphic designer. :D

Apache mod_rewrite is a very powerful tool used in hundreds of web applications for various purposes, most notably for SEO friendly (or user friendly :)) URLs. Lighttpd also has mod_rewrite but some of the most important functionalities can be achieved using mod_redirect and mod_magnet.

mod_redirect does exactly what its name says – it will redirect visitor from matched URL to a desired URL. One of the cases when this is very useful is redirecting visitors who were too lazy to type ‘www’ part of your web site address to a proper address. In Apache you would use mod_rewrite with a similar syntax.

Basically, it will check if the filename/folder requested exists on the filesystem and if not all requests will be transfered to a desired file (index.php in this case). It works pretty much as a 404 error page.

Lighttpd does this also in a similar way. It is only necessary to set a 404 error page and it will work. But for some reason this doesn’t work for me. It might be due to a bug in server.error-handler-404 (lighttpd 1.4.17 was released 5 days ago with this bug fixed). But there is a workaround for this. Lighttpd has mod_magnet which uses capabilities of LUA script language – so we can use a small LUA script to check if the file/folders exists and if not to transfer all requests to index.php file.

Everyone who has experience with hosting of popular web sites will agree with the next statement: “Apache is a memory eater!” Yes, Apache can be optimized but optimization goes more into direction of making it perform faster instead of making it resource friendly. One of the ways to overcome this is to use Apache to proxy requests for virtual hosts which generates high load to another “lightweight” web server, while leaving others on Apache.

My choice of lightweight web server is lighttpd. Developed by Jan Kneschke (he also stands behind MySQL-Proxy), lighttpd (pronounced lighty) was created with exactly this in mind – lightweight, high-performance web server with very small memory footprint. And all that he really is. Unlike Apache’s multi-threaded design, lighttpd has only one thread which has two bonuses

less consumption of memory – threads are using shared memory, but still every thread needs to reserve some amount of memory for itself

saving CPU time – creation of new threads can be very CPU costly

Apache provides modules for various server side technologies (PHP, Python, Ruby, etc). Lighttpd does not provide the same feature, but the same functionality is possible through FastCGI. And this, again, improves it’s performance and memory footprint.

In this small how-to I will describe how lighttpd and Apache are configured to work side by side.

Coming from old “Slackware school” of sys admins, I still sometimes prefer to compile software from source insted of using already built packages. Maybe it is a bad habit but it is hard to quit. :). So, we first need to install everything we need, and that is Apache with proxy support, PHP with FastCGI support and lighttpd with Lua support (needed for various rewrite support).

Building software from source is fairly simple and straight-froward process if you have all the libraries needed. We will start with Apache. Except all standard compile options (–enable-so, –enable-mods-shared=all, –enable-ssl=shared, etc) we will also use options which will include Apache proxy modules. After make install Apache is installed in /usr/local/httpd-2.0.59 folder.

And on the end the star of this how-to, lighttpd. Lua support is necessary for mod_magnet which is used for certain rewrite capabilities (I will cover that in some other how-to). PKG_CONFIG_PATH needs to be adjusted if lua.pc is not already in the right location. It can be found in etc folder in Lua source tree. Lighty is installed in /usr/local/lighttpd-1.4.16.

By now you are probably wondering why I use such insane paths for software installation. This method actually provides easy way to maintain and upgrade software. Once everything is installed you just go to /usr/local and create sym links from these folders to more generic ones and when you perform some upgrade all you need to change is the symlink and you are ready to use the new version. In case you want to revert back to the old version procedure is the same – change the destination of the sym link and you are back. Very simple.

Lighttpd comes with an example config file which you can customize to create configuration that will suit your needs. As a matter of fact there are only few things you should change to make lighttpd works. Here is my starting configuration

As you can see, I included some additional modules – lighttpd has only mod_access and mod_accesslog activated by default. I also changed paths to main document root, access and error log files and pid file.

Now let’s do some real changes. By default lighty listens on port 80, but that port is already taken by Apache so we will move lighttpd to port 8080 and make it listens only on a loopback interface, no need for outside world connections.

server.bind = “127.0.0.1″
server.port = 8080

Path to virtual hosts doc root is in /home/<domain>/public_html so we need to tell lighty where to find them

evhost.path-pattern = “/home/%0/public_html/”

Now when lighttpd receives connection it will look at “Host” part of the HTTP header and check if /home/<host>/public_html exists and serve the proper page from this folder. If the folder does not exists it will try to serve a page from a default document root (we defined this above with server.document-root option).

And now the final part – FastCGI for PHP support.

Lighttpd can use remote FastCGI servers to easen the server load even more, but in this case we don’t need this high-end feature as we will be running FastCGI on the same server. With next options we instruct lighttpd to execute all files with .php extension through the FastCGI server on localhost and we also define the PHP binary what will be used for parsing these files.

Note:Only shortcuming of this method is that your PHP scripts will not see the correct IP address of your visitors but rather the IP address of the Apache server. To overcome this you can use $_SERVER['HTTP_X_FORWARDED_FOR'] instead of $_SERVER['REMOTE_ADDR'] and gethostbyaddr($_SERVER['HTTP_X_FORWARDED_FOR']) instead of $_SERVER['REMOTE_HOST']. Also, lighttpd logs will be almost useless for any traffic analysis, you will need to use Apache logs for that.

One of the main rules when working on production servers is to keep trace of your actions. That is, so called, “cover your ass” policy. :) As most of my colleagues in IBM are using Windows on their workstations (strange isn’t it?!) they are using putty which provides logging options for their SSH sessions. But I am using Linux and OpenSSH client does not provide this luxury so I had to create this short script to save my SSH logs. It will start SSH client with all the parameters you pass over the command line but at the same time it will also start script command and log everything in right log file. Very neat. :)