vflare

Saturday, June 19, 2010

Often, you have more than one system at your disposal but no clear way of distributing your compilation workloads over to all or some of them. They might be running different OSes which makes it look even more difficult. In my case, I have one laptop (2 cores) and a desktop (4 cores) connected with a WiFi network. The laptop runs Linux (Fedora 13 64-bit) while the desktop runs Windows 7 (64-bit). I wanted to somehow offload Linux kernel compilation over to my powerful desktop and keep my laptop cool :)

distcc comes to the rescue! distcc is a program that can distribute builds of C, C++ code across several machines on a network. Its a fairly well known program but slightly problematic to setup. It took me few hours to setup everything correctly but now it all works like a charm. I hope this short tutorial will help you get running in minutes :)

So, here is what we need to do:

Install distcc on both client and server(s)

Configure distcc on both sides

Configure Firewall on server(s) to allow incoming distcc traffic

Build, build, build!

Monitoring

Before we can go ahead with above, we need to install Linux VMs (I used VirtualBox) on the desktop since its running Windows. I created two Linux (Fedora 13, 64-bit) VMs where Linux Kernel compilation can be offloaded. Each VM was assigned 1G of memory and 2 vCPUs each (a single 4 vCPU VM was quite unstable). In general, you need to have both client and server with the same platform (32/64-bit) and the same compiler versions otherwise you can run into weird compiler/linker errors or even worse, undetectable errors!

Install distcc

Firstly, you need to install distcc on both client and server -- distcc calls machine(s) where compilation actually happens as server. So in my case, VMs on the desktop will be servers. Almost all Linux distributions provide distcc in standard repositories. On Fedora, all you need to do is:

NOTE: do not create these symlinks such that they take precedence over your actual compilers, otherwise it will try to offload all kinds compilation you do on the client. In general, its not useful to offload very small compilations.

Configure distcc

Now we need to configure distcc on both client and server. On client side, we need to list servers where we want to offload compilation. On server side, we need to give “authorized” client IP address(es) and port where distcc daemon will listen for client requests.

Client side configuration:
Available servers needs to be listed in ~/.distcc/hosts file. On my laptop, it looks like this:

192.168.1.10,lzo, 192.168.1.11,lzo

Where 192.168.1.{10,11} are IPs of Linux VMs running on my desktop. The ‘lzo’ option tells distcc to compress object files as they are transferred over the network. This slightly increases CPU usage on both client and server but is useful if you have a low bandwidth network. This configuration completely offloads compilation to server. In case you want to local machine to also participate in compilation, change above to:

localhost, 192.168.1.10,lzo, 192.168.1.11,lzo

NOTE: do not use local IP address instead of term ‘localhost’ in this configuration file, otherwise distcc will incur network overhead even for local part of compilation. But if ’localhost’ is used, local part of compilation will have negligible overhead due to distcc.

As another example, you may want to restrict usage of local machine, so it can remain cool and most of the work is done by other servers:

localhost, 192.168.1.10,lzo, 192.168.1.11,lzo

This restricts the number of compilation threads on local machine to 1. Remaining threads (as specified from make –j parameter) go to other server(s).

Server side configuration:

Among other things, we need to provide list of allowed client IP addresses (by default, all IPs are blocked) and the port where distcc daemon will listen for client requests (default port: 3632). On Fedora, the configuration file is /etc/sysconfig/distccd (the exact location may be different depending on your distro). In my case, two Fedora VMs on the desktop were distcc servers, so I need to enter the following configuration on both of them (config file: /etc/sysconfig/distccd)

This specifies upper limit on number of parallel jobs on server, range of allowed client IPs, port to listen on and the log file (by default it spams the system log file: /var/log/messages). The USER option is useful if distccd daemon is started as root, in which case it is changed to user USER. See distccd man page for more details.

Of course, you need to change the user. Now, verify that it started successfully with:

ps awwx | grep distcc

Configure Firewall

We need to open TCP port 3632 (or whatever port you specified in distccd configuration). For this, insert following iptables rule in /etc/sysconfig/iptables

-A INPUT -m state --state NEW -m tcp -p tcp --dport 3632 -j ACCEPT

This must be inserted before any other REJECT rules. Alternately, you can use GUI like system-config-firewall to open TCP port 3632. In fact, this is what I used and the above configuration line is auto generated by GUI.

Build, Build, Build!

All set now, its time to build! Now, for whatever compilation you want to distribute using distcc, issue build like this:

PATH=$HOME/distcc:$PATH make –j8

This PATH prefix makes sure that those distcc symlinks get priority over the real compiler. This also gives us the control to use or avoid distcc easily – just don’t use PATH prefix as above and you will fall back to local compiler.

The distcc man page specifies that number of threads (make –j parameter) should normally be set to twice the number of available CPUs to cover for threads blocked on network I/O.

In my case, I have 2 VMs each with 2vCPUs, so total of 4 CPUs. Sometimes, I also add ‘localhost’ to distcc server list, so I can use 2 cores on my laptop too. With a total of 6 cores, my Linux kernel build time (with default Fedora 13 config) came down from over an hour to just 20 mins!

I used Linux kernel just as an example but you can distribute build of any C/C++ code with distcc. Throw-in the power of Virtualization and you can even use a mix of Linux/Windows, 32/64-bit machines.

Monitoring

You can easily monitor how your build is being distributing among servers with either 'distccmon-gnome' or 'distccmon-text'.

Sunday, May 30, 2010

Recently, I developed Linux kernel driver which creates generic RAM based compressed block devices (called zram). Being RAM disks, they do not provide persistent storage but there are many use cases where persistence is not required: /tmp, various caches under /var, swap disks etc. These cases can benefit greatly from high speed RAM disks along with savings which compression brings!

However, all this seems to be completely Linux centric. But with virtualization, zram can be used for Windows too! The trick is a expose zram as a ‘raw disk’ to Windows running inside a Virtual Machine (VM). I will be using VirtualBox as example but exposing raw disks should be supported by other Virtualization solutions like VMware, KVM too.

Of course, you need to have Linux as the host and have the zram driver loaded. Here are the steps we need to do:

Get zram sources, compile and load the driver

Set zram disksize

Create VMDK file with raw disk set as /dev/zram0

Add this disk to Windows VM

Once this much is done, the disk will be detected in Windows as ‘VBox HardDisk’ and after disk initialization (as is needed for any new harddisk), you can format it with NTFS (or any other) filesystem.

Get zram sources, compile and load the driver

zram is not yet available as a downloadable tarball, so you need to checkout the source directly from repository:

hg clone https://compcache.googlecode.com/hg/ compcache

Now, to compile it against your running kernel, just run:

make

If you get lots of compilation errors, then you are probably missing kernel-devel and kernel-headers package. This driver has been well tested with Linux kernel 2.6.33-xx which ships with Fedora 13 (x86_64).If compilation went fine, you should now have the zram.ko driver. Load it along with its dependencies:

modprobe lzo_compress
modprobe lzo_decompress
insmod ./zram.ko

Set zram disk size

Disksize is set using zramconfig userspace utility which is compiled along with the driver. You can find it in <sourcedir>/sub-projects/zramconfig/zramconfig. Following sets disksize as 2GB and initializes the disk:

zramconfig /dev/zram0 –-disksize_kb=2097152 --init

Create VMDK file with raw disk set as /dev/zram0

Now we will create a VMDK file which simply points to /dev/zram0 device as its data source. This VMDK will later be added to Windows VM in VirtualBox.

This command creates VMDK file in ~/temp (you can replace it with any other location) and also registers as one of the harddisks in VirtualBox.

Normal (non-root) users cannot do direct I/O to /dev/zram devices but you certainly don’t want to run VirtualBox as root! So, as a workaround, you can ‘chown’ the device:

chown username:username /dev/zram0

Of course, you need to run this as root. This gives you the ownership of the device, so you will not have to run VirtualBox as root.

Add this disk to Windows VM

This VMDK disk (~/temp/zram.vmdk in our example), can be added to any VM, be it Linux or Windows. But for now, we will stick with Windows. Go to VM’s storage configuration and add this disk. You will then get storage configuration like this:

Now poweron the VM. Windows will detect this disk as ‘VBox HardDisk’ and you need to ‘initialize’ the disk within Windows before you can start using it (as is needed for any new harddisk). To initialize the disk, goto: Control Panel –> Administrative Tools –> Computer Management –> Disk Management. Here you will automatically get a wizard to help to initialize the disk and assign it a drive letter. Make sure you set Block Size as 4096 and keep NTFS filesystem compression disabled (default) -- otherwise, you will get suboptimal performance!

After the disk initialization wizard finishes, zram disk should show up in My Computer:

Above shows zram drive highlighted, formatted with NTFS filesystem and size of about 2GB. You can now use it as any other disk.

Apart from use as a generic disk, an interesting use case it to have Windows swap file on this disk. This way, whatever is swapped from Windows goes to host (Linux) where it is compressed and stored in memory itself! In a way this is like dynamically giving more memory to a VM. Reading/Writing to this disk is way faster than rotation disks, so it should also improve your VM performance.

Saturday, April 10, 2010

Since last few months, I have been playing around with KDevelop which is a KDE based IDE for C/C++ any many other languages. Its a large C++ codebase and navigating through all the files, classes is quite difficult with usual VI + cscope combination. The most lacking part is a readily accessible KDE API documentation which is almost essential, no matter what component you are working on. There is a KDE API reference site but searching there for every reference is very cumbersome. So, I decided to setup a local API reference. Steps:

Checkout sources

Generate API documentation from sources

Import documentation in Qt Assistant: It provides nice interface to navigate through all the documentation

Checkout Sources

We will checkout sources for kdelibs, kdesdk, kdevplatform and kdevelop. I just want to hack on KDevelop, so you may need checkout additional repositories depending on what you want to do. Instructions for checkout of different repositories and branches is present here.

We’ll checkout everything in $HOME/repo/ (of course, you can change it to anything you want).

Generate API Documentation

Now we’ll generate documentation from sources in .qch format – which is a single compressed file containing all the documentation. These qch files will be later imported in Qt Assistant.

First, download script to generate these qch files. This is a slightly modified version of the script included with kdesdk (kdesdk/scripts/kdedoxyqt.sh). This modified version works for all the source trees listed above and also generates class diagrams etc. Now, generate documentation for each source tree using this script:

This will generate a .qch file in each of the trees (<tree>/apidocs/qch/<tree>.4.x.qch). For example, kdesdk/apidocs/qch/kdesdk.4.x.qch

All these qch files total around 120MB.

Import In Qt Assistant

Now import all these qch files in Qt4 Assistant. Start the program and goto: Edit –> Preferences –> Documentation tab –> Add… and select a qch file. Repeat this for all four .qch files and you are done.

Figure 1: Qt Assistant with all the KDE API documentation.

You should now have all the KDE documentation listed in ‘Contents’. Now any class reference is quick and easy with lightning fast Qt Assistant’s Index.

Wednesday, February 24, 2010

As part of my job and for hobby projects, I often have to work on large C codebases (particularly, the Linux kernel). Like a great majority of developers, I spend almost all my work time understanding these ever evolving, huge code bases.

Few years back, the only tool I found usable for this purpose was VI editor+cscope. However, this combination falls short of various features which modern IDEs provides that make life so much easier -- graphical call graph/tree, integration with SCMs (Git, Perforce etc.), integrated disassembler, access to previous queries and other conveniences typically provided by GUI environments like better use of multi-monitor, widescreen setups and so on.

Such large C based projects typically have their own build systems with compilation offloaded to separate cluster(s). Also, kernel debugging is really not something which is very feasible to do within IDEs. Thus, what is needed for such kind of work is a simple GUI application which can overcome various problems with classical VI+cscope combination, some of which I listed in the beginning. Integrated build system, debugger etc. provided by typical IDEs is simply not required.

Since last 1 or so year I (and various teammates) have been hooked to KScope which is a KDE based frontend for cscope+ctags. In particular, it can show graphical call graph/tree which is extremely useful. However, it lacks integration with any SCM (so cannot view diffs with previous revisions), no integrated disassembler and so on. Initially, I decided to work on extending this great tool but soon realized that its not feasible:

KScope 1.6.x is based on KDE3. Porting it to KDE4 platform is far from trivial and adding to existing KDE3 based code is surely going to hurt as this outdated platform is no longer the focus of KDE developers.

KSscope 1.9.x is highly stripped down version of 1.6.x series! It looks like a bare Qt application with “QScintilla” editor part. This departure from KDE framework makes it look ugly, with inferior font rendering (compare with Kate!) and makes it real hard to use new KDE based frameworks like “KDevPlatform” which makes it easier to develop IDE like applications.

KScope is frontend for cscope+ctags. These backends that themselves ancient and almost unmaintained. They also lack incremental cross-reference database update which is essential for frequently updated projects like the Linux kernel.

Add to all this the fact that the KScope (only) developer has now left the project and considering above all, its difficult to jump in and maintain this dying project.

So what now? The need for such a tool led me to start a new project: KXref which is going to be a KDE4 based code-reference tool. It has lots of features planned which I missed the most in KScope (details on project home). Its backend for code cross-referencing will be GNU Global which is under active development, has good roadmap and supports incremental update even toady.

Currently, only bare bones have been developed and the progress so far has been slow partly due to the time factor and due to my fear of C++ in general. I’m also a relative newcomer to KDE programming. However, I regularly dump down any new ideas on the project page – at least this part I’m enjoying :) If you have any time and some KDE/Qt programming experience or some ideas to share, you are surely welcome! The project badly needs contributors :)

As I said, the project is in very initial stages and its really too early for screenshots – anyways, here is what is looks like today!

Tuesday, February 23, 2010

I recently installed the latest release of Ubuntu (9.10 aka Karmic Koala) and was disappointed to find an ancient version of compcache/ramzswap installed by default. My favourite distro (Fedora) does not even ship compcache. Currently, major part of compcache/ramzswap code is included in mainline (2.6.33) however full support is expected no sooner than kernel 2.6.34. I think it will take quite a long time before 2.6.34 kernel is adopted by various distros. Also, it takes considerable amount of time before additions/fixes to compcache are synced with mainline kernel. I think Ubuntu, Fedora and their various 'spins' can benfit from memory compression and there is no need to wait for these future kernel releases. Providing support for uptodate compcache version is also very easy and non-intrusive. Here what any distro needs to do:

Apply (small) "swap notify" patch to kernel. This patch is included in '-mm' tree since a long time and is well tested.

Ship ramzswap kernel module and rzscontrol utility as a spearate package. This package can be updated as soon as new compcache version is available for download. This package can be provided through rpmfusion repository for Fedora and something similar for Ubuntu.

All of above (notify patch, ramzswap module, rzscontrol utility) are available at project's download area. I hope it will be adopted by more distros in future giving me motivation to keep the development alive :)

Sunday, December 27, 2009

Since a long time, I was looking for a graphical git diff viewer which could show original and modified file side-by-side and highlight the changes. There are few solutions but none of them is sufficient:

A tool included with git called 'git-difftool' is partially helpful -- it can show changes graphically but diff for each file is shown one-by-one. This is very irritating. In fact, unusable even with just 10-15 files.

Another alternative is the meld diff viewer which is "git aware". The problem here is that it can show diff for uncommitted changes only which is very limiting. What if you want to see what changes between Linux kernel, say 2.6.33-rc1 and 2.6.33-rc2? or changes between last two commits? meld cannot do it, AFAIK.

Finally, with kompare, you can do something like: 'git diff master | kompare -o -'. This method however, does not show original and new files side-by-side. It is simply prettier diff highlighting.

None of above methods are sufficient. So, I wrote the following script which solves our problem: show complete contents of original and new files and highlight the differences.

This script reconstructs (sparse) tree containing modified files. This is very useful with large projects like the Linux kernel -- there is one Kconfig file in each directory, which is the one this change modified!

To use it, just use the script instead of 'git diff' directly. Arguments to this script are the same as if you are using git-diff command. See script comments for further help.

Lets see some examples:

1) I have a 'qemu-kvm' git tree with some uncommitted changes. In this tree 'git diff', without any arguments, would generate the diff. So, we do the same with git-diffc.

git-diffc

2) I want to see difference between Linux kernel 2.6.33-rc1 and 2.6.33-rc2. This can be done with 'git diff v2.6.33-rc1 v2.6.33-rc2' which will generate unified diff between these two releases. To get graphical diff (note: we always use same args as git-diff):

git-diffc v2.6.33-rc1 v2.6.33-rc2

Here is the output with default (kdiff3) diff viewer:

Note that entire directory hierarchy is reconstructed which is almost essential for such large projects. A simple flat list of changed files will be almost useless. As another example if you were simply interested in checking what files are modified across these two versions, you could do:

DIFF_BIN=/usr/bin/tree git-diffc v2.6.33-rc1 v2.6.33-rc2

The same command as above, but overriding diff viewer (see script comments for details).
I hope you will find this script useful. Happy Hacking and Happy New Year!

Wednesday, September 23, 2009

You worked on some part of Linux kernel. It works great. Now, how to generate the patch series and send it out for review? For this, I always used to generate diffs, create a set of draft mails (one for each patch) in KMail or Thunderbird, and send all these mails one-by-one. This workflow quickly became a big headache. Then I learned Git (and some related tools) to do all this from command line and wow! what a relief!

This is just a quick dump of my notes which I made while struggling to make it all work. There are probably many other (most probably, better) workflows but this is what works for me. I hope you will find it useful. Any comments/suggestions welcome!

Why workflow with KMail/Thunderbird was bad?

Firstly, for any Linux kernel patch series, the convention is to create an introductory email a.k.a. PATCH [0/n] where 'n' is number of patches in series. Then, all the following patches should be sent as reply to this introductory patch (so, if someone is not interested in your patches, they can simply collapse the thread). For example:

With KMail or Thunderbird, I could not find any clean way to compose draft mails with this threaded structure. If when you compose drafts with this threading, it screws-up patch ordering while sending.

Secondly, almost all graphical email clients have the habit of tinkering with the text and corrupting the patch. BTW, this is not a problem with KMail atleast (Linux kernel includes documentation on how to avoid this problem with other email clients too, including Thunderbird).

Thirdly, whenever you have to send a patch series, you have to re-create entire To: and Cc: list. This can become quite cumbersome -- as more and more people review your code, this list can become quite long.

Workflow with Git

Creating patches:

Method 1:

Suppose you have Linux kernel tree with two branches: mywork and master. You make series of changes (commits) to mywork branch. If you have a clean work history -- i.e. last 'n' git commits reflect incremental development of your work then you can simply use following to create set of patches ready for submission:

git format-patch --cover-letter -o patch_dir master

Options:
--cover-letter: also generate 'patch' that introduces your patch series.
-o: specifies directory where to store patches created (patch_dir here).
master: replace this with branch to diff against
(as usual, see man page for all other options and details).

The filename is based on first line of git commit message.
Each patch file begins with subject line:
[PATCH 0/n] <first line of git commit message>
and rest of git commit message as the body, then followed by the actual patch.
If you want to omit automatic appending of '[PATCH 0/n]' part to the subject, you can use '--keep-subject' option for git format-patch.

Now, review subject and body for each of these patches -- especially cover letter which is, of course, without proper subject or body when initially created.

Method 2:

Sometimes you don't have a clean working history i.e. last 'n' commits do not represent incremental development of your work. This is usually the case with me. In my case, all development happened outside mainline. To prepare patches against mainline, I commit all files in one-go and make lot of small commits later as cleanups, bug fixes are done.
In such case, you can create another branch which has a 'clean' history by carefully committing changes in correct order to this new branch. Then, you git format-patch (as in Method 1) to generate patch series. However, I usually do following -- no good reason to prefer this but it just better suits my laziness.

Since my git commit history is full of uninteresting small commits, I create individual patches manually. For each patch, I hand-pick set of files to create patch like this:

Also create 'cover letter' that introduces your patch series. Give it any filename but prefix it with '0000' (for example, 0000-cover-letter.patch), to make sure git send-email picks this patch for sending before any other patch (more on this later).

Sanity check for patches:

Verify that your patches follow all Linux kernel coding style (and some other basic sanity checks) by using checkpatch.pl script included with linux kernel:

linux_source/scripts/checkpatch.pl <patch filename>

Sending patches:

Now the patches are prepared and sanity checks are done, its time to send these patches out for review. We will use git-send-email for this.

First, prepare and address book containing entries for all the recipients. I use abook which I found very easy to use. With abook create this address book and export it as 'mutt alias' file (using 'e' for export, followed by 'c' for mutt alias format). This mutt alias file will be used by git send-email.

Options:
--no-chain-reply: Don't send every patch as reply to previous patch. Instead, what we want is to send all patches as reply to _first_ patch (i.e. the cover letter).
--suppress-cc=sob: Don't Cc email mentioned in 'Signed-off-by:' line in each patch.
--suppress-cc=self: Don't send a copy of patches to you.
(If your requirements are different, then man git-send-email is always your friend!)

git send-email seems to pick patches in alphabetical order. So, 0000-cover-letter becomes the first patch. All other patches are sent as reply to this one. At last, before actually sending patches, add ‘—dry-run’ option to git send-email and make sure everything looks okay.