At $work, we have a lot of java processes that are ran via cron and other
wrappers that do some pretty critical tasks. The apps have been written so
that the whole thing is wrapped in a try/catch that will call system.exit(1)
should something not go right. Our wrapper scripts watch for a non-zero exit
code, and alert Nagios if something went wrong. This works great except for
when a VM encounters an outOfMemory exception (OOM). The Java VM attempts to
continue on, but if the main thread hits this exception, the entire VM will
exit. However, the application code that exits with a status of 1 never gets
called, so the application ends up dying with a status of 0. Well, Sun (Oracle
now I guess) gave us a new option in Java 6 that was backported to 1.4.2_12
and up that allows us to tell Java to run a shell command when it encounters
an OOM exception. By adding the option

-XX:OnOutOfMemoryError="kill -9 %p"

to our Java command line, the VM will execute a shell that calls the kill
command, with an argument of the PID of the VM. The -9 option to kill will
cause the VM to exit with a non-zero status, so that our wrappers will pick up
the error and alert the right people. Note: this feature was never backported
to Java5 - sorry!

Took a fair amount of googling around to find the solution to this one. With
the Node Gallery 3.x branch, we
needed a way to quickly add an image to an existing gallery. We could have
displayed the whole node form, but there’s a lot of things on that form that
we can just use the defaults for 99% of the time. We need just three fields
filled in: Title, Caption, and the imagefield itself. To use the same
imagefield widget that handles all the hard work for you on the node add field
on your own field, first create a handler in hook_menu such as this:

This is pretty straightforward, up
until lines 28 - 30. Those three lines setup the form array and then append
the results from content_field_form() to our existing form. Still, very easy,
but I wasn’t able to find any documentation on how to do this. Just in case
you’re curious, here’s the submit handler for that form.

At $work, we use Hudson extensively, and it
rocks. For those who don’t know already, Hudson is an implementation of
Continuous Integration
that is remarkably easy to use. I wrote about my first impressions of Hudson
previously. Hudson’s original audience was Java developers using Ant or Maven,
but with plugins and some hacking, we can make it do some things for us as
module contributors to Drupal. I’ve been cutting my Drupal developer teeth by
working pretty intensively on a few modules for Drupal - Node Gallery and it’s derivatives. We are
hitting a crucial point in development where we are switching from the old way
of defining fields on a node to using CCK. While the module is still in alpha,
it’s still in use by quite a few sites - as of this writing it’s number 465 on
the list of Drupal modules. Not exactly the spotlight, but we can’t go
breaking things without making people angry either. I figured this would be
the perfect place for Hudson - it will let you know when you break something.

Pieces of the Puzzle

Here’s the pieces you’ll need:

A linux server with Java, a working Drupal install (that may get broken at times) and the cvs command-line utility.

The shell script

This is the most important piece of the setup. By utilizing Hudson’s
environment variables, we can make this as portable as possible. By using the
same script for all Hudson jobs, changing the script automatically changes all
of our jobs at once. Let’s dive right in:

Lines 14 through 23 find all files in $WORKSPACE (which is
set by Hudson) that have the ‘must** name your Hudson project the same as the
module name. Also note that your Hudson user needs to have write access to the
specific module directory that it’s installing. Line 29 runs drush so that it
invokes update.php, and answers yes to all questions. Lines 32 through 37 runs
the default code review from the coder module. You will have had to set this
up initially via the web interface. It then scans through that output looking
for any complaints about our $JOB_NAME, and if found, prints it to stdout and
increments our exit value by 1. Note we don’t exit here, as it’s a non-fatal
error. However, Hudson will treat it as a failed build and email everyone
about it. Lines 40 through 48 runs the Translation Template Extractor command
line utility against our module. It then copies the general.pot to the files
directory. Again, the user running Hudson will need write access for this to
work properly. If the potx-cli.php script should exit uncleanly, we increment
our exit value by one. Last in my script, we simply exit with whatever value
we have ended up with at this point. Again, if Hudson sees anything other than
a zero, it will email everyone about it. Since the modules I’m working on
don’t have Simpletest tests ready yet, I don’t run them in this script.
However, it’s on the horizon, and can be ran easily using run-tests.sh. Note
that there is a patch that will cause run-
tests.sh to output it’s results in a JUnit style output, which Hudson
understands fully. If you implement this, I strongly recommend applying that
patch.

Hudson Setup

Now that we have our script ready, we need to setup our Hudson job. Note that
installing Hudson itself is outside the scope of this article - it’s
refreshingly easy and doesn’t need repeating here. There are two things you
must do before creating the build task. First, setup your “E-mail
Notification” section according to your mail server at
http://myhudsonserver:8080/configure. Also, you will need to install the “URL
Change Trigger” plugin by navigating to
http://myhudsonserver:8080/pluginManager/available. Once you install that
plugin, create a new job. In my case, the job was named ‘node_gallery’, since
that’s the name of the Drupal module I was working with. Select ‘Build a free-
style software project’ when asked. Under the “Source Code Management”
section, select “CVS”, and then fill in the CVSROOT of the project you’re
working with. In my case, it was
‘:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib’. Next, fill
in the path to the module in the “Module(s)” form - for me it’s
‘contributions/modules/node_gallery/’. If you’re working with CVS HEAD, leave
the Branch field empty, otherwise type in the branch name there.
IMPORTANT: to avoid abusing Drupal.org’s already overloaded CVS server
with cvs logins once every 5 minutes, we will point Hudson instead to the RSS
feed for the CVS log messages. Make sure “Poll SCM” is unchecked, and check
“Build when a URL’s content changes”. To obtain the URL, you need the node id
of your project. There’s many ways to do this, but you can find it by going to
the project’s Drupal.org page and click on “View CVS Messages”. From that
page, click the orange RSS icon at the bottom left of that page. Copy and
paste that URL into the URL field in Hudson. Under the Build section, click
“Add build step”, and select “Execute Shell”. In the resulting “Command”
textarea, type the full path to the shell script we setup above. The final
section, “Post-build actions” is up to you, but you’ll likely want to enable
email notifications. Place a checkmark in the “Email Notification” box, and
type in the email addresses of the desired recipients. Click Save, and you’re
done! Hudson will start doing a CVS checkout of your project, and will start
running tests on it. It will email you once anything goes wrong, and will
notify you again when the problem is resolved. It will only run these tests
after someone commits code to CVS, so you will likely need to hit the “Build
Now” link in the left nav a few times. We’ve really only scratched the surface
of what Hudson can do. You can track performance using JMeter, add all kinds
of crazy plugins, require logins, the list goes on and on. While this helps,
it’s still nowhere as helpful as a Ant/Maven job can be. Hopefully this
article is enough to spark some interest from the Drupal community so that we
can write some better continuous integration code in the future. Also, I’m far
from being an expert on either Drupal or Hudson. I wrote my first code for
Drupal in November of 2009, and I only tinker with Hudson on occasion at work.
Hudson works so well, it’s one of those “set it and forget it” apps. I would
love for readers to leave comments on any mistakes I might have made, or
possible improvements I may have missed!

Just got the chance to finally sit down and watch Ben
Rockwood’s presentation at LISA 09: ZFS in the
Trenches. If you are even thinking about ZFS
and how it works, it’s a very informative presentation. There is very little
marketing-speak, and he very specifically targets sysadmins as his audience.
Great stuff! Of interesting note about his comparison of fsstat vs iostat, our
Apache webservers routinely see about 5MB/sec reads being asked of ZFS, but
the actual iostat on the disk shows that almost all of that traffic is being
served up from ARC.

There’s a nasty upstream bug in GTK present in Ubuntu 9.10 that makes Eclipse
Galileo all but unusable – specifically it makes clicking many buttons with
the mouse just stop working. You can use tab and spacebar to make it work, but
that’s not much of a workaround. All you need to do is set an environment
variable before starting Eclipse:

Over the years, I’ve come to know
and love Eclipse. Though it has roots in Java,
ironically, I use Eclipse for just about everything except for coding Java (if
I wrote Java code, I’m sure I’d use Eclipse). Eclipse is great for browsing
Subversion, coding
PHP, coding
Perl, and even coding shell
scripts. For die hards like me,
there’s the viPlugin that allows you to use all the vi
commands you know and love within Eclipse. About the time you get your perfect
Eclipse setup established, you buy a new laptop on a new platform. Or, in my
case, I have three “primary” development workstations, each on a different OS.
The rest of this article will show you how to hook
Dropbox into your Eclipse installation, allowing you to
share your plugins and configurations across different versions of Eclipse, on
different machines, and even on different platforms.

Truth be told, doing this type of setup with Eclipse was actually easier to do
with older versions of Eclipse. Since they’ve moved to the p2 provisioning
system, it became a little harder to do, but still very possible. After much
googling, I finally came across this StackOverflow
question that gave me the
pieces I needed to set this all up.

A little prep work on the frontend will save us a huge amount of time in
maintenance. Note that I use Dropbox in this article, but any similar service
should do. We’ll setup our Linux install first, since we can script things a
little easier there. Go ahead and install Dropbox and Eclipse - they’re both
very straightforward installations.

Let’s assume that our Dropbox directory is directly under our home directory,
and our eclipse installation is in ~/eclipse. Let’s setup some environment
variables and create our directory structure:

With our directory structure setup, it’s time to pick a plugin to install.
Let’s do PDT. The key here is that we start
Eclipse by pointing it to a new configuration directory which lives on our
Dropbox account, and install the new extension. This will force Eclipse to
install the plugin to the Dropbox directory, instead of the local directory.
Start Eclipse like so:

Note that you can change the ‘pdt’ portion of that path to whatever you
choose, but you must include the trailing eclipse/configuration portion. Once
in Eclipse, go ahead and install PDT just as you normally would, then exit
Eclipse.

Now that we’ve installed the PDT extension to a shared location, it’s time to
point our local Eclipse installation to it. I wrote a quick script to do just
that:

This script creates a directory named ‘links’ in your Eclipse local
installation, and creates a file for every extension that contains one line
containing the path to the target extension. Now, start Eclipse. For some odd
reason, the extensions wouldn’t actually install until after I restarted
Eclipse a second time, so you may need to do the same. You should now see your
plugin in Eclipse.

Please note that if you’re doing cross-platform development, you’ll save
yourself some headache by not sharing the subclipse plugin. There’s too
much of that plugin that depends on the underlying OS to share effectively.

Just a quick posting of some simple benchmarks
today. Please note, these are not the be all, end all performance results that
allow everyone to scream from atop yonder hill that Solaris performs better
than Linux! This was just me doing a little due dilligence. I like Solaris 10,
and wanted to run it on our webservers. We’re looking at using NGINX to serve
up some static files, and I wanted to make sure it performed like it should on
Solaris 10 before deploying it - you know, right tool for the job and all. So,
disclaimers aside, here’s what I found.

The Hardware

The hardware I tested was a Dell PowerEdge R610, with 12GB RAM, and 2x4 Core
Nehalem CPU’s. SATA disks were used with the internal RAID controller, but no
RAID was configured.

The Benchmarks

I used ApacheBench, as shipped with Glassfish Webstack 1.5. Yes, I know,
there’s all kinds of flaws with ApacheBench, but the key here isn’t the
benchmarking tool, it’s that the tool and it’s configuration remain the same.
Here’s the command line I used:

CentOS 5.4

I installed CentOS 5.4, ran yum to get all the updates possible. I then
installed NGINX 0.7.64 from source, and simply copied one image file into the
document root. I did a few sysctl tweaks for buffers and whatnot, but found
later that they didn’t impact the benchmark. Here’s what ApacheBench running
on the local host had to say:

No matter how you slice it, that’s pretty darn fast. I knew that Solaris 10
had a completely rewritten TCP/IP stack optimized for multithreading, and that
it should keep right up with Linux. However, NGINX uses different event models
for Linux and Solaris 10 (epoll vs eventport), so I wanted to make sure there
weren’t any major differences in performance.

Solaris 10

I installed Solaris 10 x86, ran pca to get all the updates possible. I then
installed NGINX 0.7.64 from source, and simply copied one image file into the
document root. Here’s what ApacheBench running on the local host had to say:

Again, very impressive results. Overall, it appeared as though Solaris+NGINX
was just a few millis faster than CentOS+NGINX in most cases, but certainly
not enough to change your mind of what platform to use. If you notice the 4.5
second request on the Solaris box, I’m pretty sure that’s a TCP retransmit
that I can work out with ndd tuning.

The Verdict

NGINX is freaking fast. My hunch is that it’s so fast, that I’m actually
running up against the limits of ApacheBench, not NGINX – but that’s just a
gut feeling. The verdict is that you won’t be making a mistake going with
either Linux or Solaris when setting up your NGINX server.

Our site at $work is generating Apache logs that, when combined sequentially
into one file, are larger than 50GB in size for one day’s worth of traffic.
AWStats' perl script pretty much chokes when working on this much data. Last I
checked, Webalizer wasn’t much different, and probably wouldn’t scale up to
that amount of data either. Does anyone out there have any advice on a
commercial solution for Apache log analysis that can scale up like that?

Our rule of thumb is to increase the number of parallel downloads by using
at least two, but no more than four hostnames. Once again, this underscores
the number one rule for improving response times: reduce the number of
components in the page.

I’m here to tell you, if you have AOL users surfing your site, do not use
four hostnames. When we pushed this feature up to production, we had one
hostname that served up the HTML, and we had four hostnames that served up
imagery (all these hostnames pointed back to the same IP, but doing this
allows a performance boost in the browser). For this example, let’s say that
www.mydomain.com is the HTML hostname; img0.mycontent.com, img1.mycontent.com,
img2.mycontent.com, and img3.mycontent.com were the imagery servers. This most
certainly improved performance on the client side, but we started receiving
some reports from a few users that they were no longer able to see any
imagery on the site since we dropped the new code. We immediately knew what
was causing the issue, but had no idea why, or how far spread out it was.
Well, after poking around some of the “big boys” websites such as Amazon, we
noticed that while all of them separated their components as suggested by
Yahoo!, all of them used only one hostname for the imagery. We quickly
configured our webapp to only use www.mydomain.com for the HTML, and
img0.mycontent.com for the imagery. Once we did that, our AOL users were again
able to see imagery. Now, I have no idea how widespread the issue was. I know
it was limited to users of the AOL browser, and I suspect it’s probably a bug
in a specific version of their browser. However, if your site needs to provide
compatibility to the most users possible, you may want to use just one
separate hostname for splitting components. I hope this helps someone else!