Interesting Things

MySql only allows indexes on 64 columns per table. This is a hard limit, set in the header file, and therefore can’t even be changed at compile time without editing that file first. It was suggested that anyone who wants more than 64 indexes should not be using a database for searches anyway — use something like Solr instead.

The goal is a list of short, pithy, sticky aphorisms to help both newbies get Agile and us stay that way. Think Agile Andy’s Almanack (or something).

I’ve thrown everything I got (a lot in person & email) together and categorized a bit. Please comment, add, delete, etc. As I said, I’m working on a presentation around this data and welcome your feedback.

Here goes:

Stick to Conventions

Follow the local ground rules (indenting, naming, structure, etc.)

Always take the next story (don’t let ‘fun’ or ‘hard’ get in the way of business priority)

Pivotal Labs is teaming up with Outside.In in New York City to co-host the monthly Ruby Happy Hour, on the first Wednesday of every month, starting Sep 3.

There will be pizza, beer and wii-based entertainment for everyone. Apparently these happy hours are getting quite popular, so please RSVP either in the comments here, or on Outside.In’s blog.

Where: Outside.in, 20 Jay St Suite 1019 (10th Fl), Brooklyn, NY
When: 7-9PM, Wednesday September 3rd
Who: If you’re a developer who uses Ruby and would like to meet some other Ruby folks, toss around ideas, or just have a few beers, we welcome you with open arms!

Outside.in (www.outside.in) is the web’s leading platform for neighborhood news and conversation. Outside.in’s technology dynamically maps web content to offline locations, which enables hyperlocal news discovery and sharing for consumers. The company also recently released GeoToolkit (www.outside.in/toolkit), which provides powerful tools for content creators of all sizes to optimize, promote and monetize their local content. Outside.in is supported by leading investors including Union Square Ventures, Milestone Venture Partners, Betaworks and the New York City Investment Fund. For more information, visit www.outside.in or the companyís blog at http://blog.outside.in.

Using multiple buckets for Amazon S3. One of our sites has a lot of images (perhaps 30+ photos per page, different for each page and user) and got significant benefits from using four buckets instead of one. Multiple buckets allows browsers to fetch several images in parallel. Increasing it beyond four probably wouldn’t help, as browsers have a limit on how many parallel requests they will send.

Amazon S3 now has a copy command. This could be useful, for example, if you have a lot of data in a single bucket and want to move it to multiple buckets. Copy is faster than downloading and re-uploading all that data. The ruby S3 gem, however, only lets you copy in one bucket, so you’ll need to bypass the S3 gem.

We wrote a script to dump a local SQL database and copy it up to a remote server (for example, a demo or production server). This is in contrast with a script we wrote some months ago which copies from demo to a local workstation (for test data, reproducing data-driven bugs, etc). The push to remote feature was for a situation in which there was a bunch of data to be generated (based on some XML input files) and we could afford to bog down a workstation for half an hour, but not an overloaded (and perhaps underpowered) server.

Deprec is a set of capistrano recipes for setting up a remote server (in conjunction with deploying an application), for example creating accounts, ssh keys, init scripts, logrotate, etc.

As some of you may have heard, Rails 2.2 is going to be thread safe. This was pretty exciting for us as we’ve been struggling lately with memory issues running Mongrel clusters in virtualized environments, where memory is scarce and those Mongrel processes are pretty big.

While some have pointed out limitations of MRI due to green threads and certain commonly used libraries, and others are super-excited about what this means for JRuby, we were very curious about how Mongrel would perform serving Rails requests concurrently on MRI, which is very close to our/the standard Rails deployment.

As it turned out, Mongrel was pretty easy to work with, and initial tests suggest we were able to see a huge drop in memory usage while getting comparable performance to a Mongrel cluster. I haven’t included numbers as this was really more of a proof-of-concept spike and others will no doubt run with that, but we’re happy enough that we’ll be moving an app or two to edge shortly to see how well they perform.

A patch for Mongrel is pending feedback from the Mongrel team, but for now if you’re curious you can grab the source and run your own threaded Mongrel (on edge Rails) like so:

mongrel_rails -N X -n Y

where X is the number of concurrent requests Rails should process (active Rails threads) and Y is the number of concurrent requests before Mongrel starts refusing connections (standard Mongrel num-processes). Note: run one process per-core or turn off a core if you want to compare performance to a Mongrel cluster.

Feel free to provide feedback in comments, or on the mongrel-users mailing list.

Bruno Miranda posted a review of various agile project management tools including Mingle, VersionOne, TargetProcess, and Tracker. Check it out here.

We’ve let Bruno know that Tracker is actually under active development. One of the first new features that we’ll be rolling out soon is a REST API, followed by various usability improvements that will make it easier to work with larger projects.

We’re also looking for more suggestions on how to make Tracker better, especially in the area of higher level planning. Send your ideas to tracker@pivotallabs.com, or post them on Satisfaction.

And if you haven’t tried it yet, the Tracker beta is fully open to the public, feel free to sign up.

Does anyone have any experiences with one of the object mother libraries like object daddy? (Answers at standup were “no, we always wrote our own object mothers in a domain-specific way”). The appeal of a library is that it might help keep track of what needs to be done to make an object pass rails validation.

Clock.zone now has exists. (Background, pivotal has a Clock class which has a now method which can be implemented either by a call to Time.now for production, or a mock clock which lets tests specify the “time”). This is so that the rails 2.1 features like Time.zone.now have an analog in Clock.

First of all, fie on Apple for giving both their cloud storage service and their backup program names that are almost completely google-proof. They’ve recently corrected one of those by renaming “dot mac” to “MobileMe” but calling your backup program “Backup” is a great way to make it really hard to investigate. It’s like, imagine how hard it would be to do a background check on someone named John Doe.

So I use the Dot Mac Backup and it works pretty smoothly, which is the second most important feature in a backup program. (The most important feature is the ability to actually restore files.) But then one day it said that to incrementally back up my “Home Minus Media” set — the set containing my Home Folder, but excluding big-ticket items like Music, Movies, Backups, Downloads, and so on — would require 63 DVDs. WTF?

It turned out that the problem occurred after I trashed a few old DVD rips that I had finished watching, and the culprit was the directory /Users/chaffee/.Trash. Seems like the UI was helpfully excluding it from the list of subdirectories of /Users/chaffee, it being a system file and all, so I couldn’t mark it to exclude. That’s OK, I think, I’m a power user, so I’ll just check the box that says “Show invisible system files.”

Except there’s no such box. Try as I might, I can’t find a way to exclude the Trash folder from the UI. I had to dig into the file system and edit Backup’s own data file, as follows.

In Backup, create a backup set and exclude at least one item in it

Quit the Backup app

In Finder, open up ~/Library/Application Support/Backup/BackupSets

There will be a list of randomly named .backupset folders. Each contains a folder named Contents. For each, use Quick Look (hit Space) on and a file named InfoPlist.strings to find the one containing your set.

Open its sibling named User.quickpick in a text editor like TextMate.

The “quickpick” file is actually a folder, so open up the file buried under there named DefinitionPlist.strings

Find the section “Prune Paths” and add an entry for ~/.Trash

Save the file and relaunch Backup.

You should see the “.Trash” entry excluded as if you had clicked on it — which you would have if they had showed you the silly thing in the first place.

As you can see from the screenshot, I’ve still got some excess gigabytes to hunt down and exclude, but at least I won’t get burned the next time I erase a metric buttload of pr0n– uh, I mean, content I legally acquired and temporarily transferred onto my personal computer in compliance with the DMCA.

A possible solution is a wiki (mediawiki?). Something
google-code-like gets extra points for the issue tracker. Google
groups gets points for the mailing list. Google sites seems like is
might be a decent basic option (it’s easy to point a CNAME at it too).

Experience reports/recommendations appreciated.

Rails hackfest is on through the end of August. Get points for getting patches accepted, and win prizes.

In the script, readability and simplicity are favored over clever abstractions and DRYness. Hopefully, even people who don’t know shell scripting or Ruby can read the scripts and easily understand the commands it is executing.

A standardized environment is assumed: A dedicated Ubuntu 8.04 system, Ruby 1.8.6, and latest dependencies via aptitude. PCs and Virtual Machines are cheap, and Linux and CCRB are free. There’s really no reason you shouldn’t be able to run a dedicated CI box. If this environment doesn’t work for you for some reason, the scripts should be self-explanatory enough that you can easily hack it up to work for your environment (and contribute your version back to to the project!).

I use the magic fairy dust of GitHub to eliminate build scripts, release scripts, packaging, versions, and pretty much all the regular boring overhead of a project. The README.txt is my only documentation. The GitHub “Download Tarball” link automatically provides packaging and uniquely-named packages (by the git hex commit id) for each “release”.

I’m pretty pleased with how this turned out. I hope it will lower the barrier for people to start trying out Continuous Integration, as well as provoke some thought about simplicity and minimalism. I’ve tried it out on a few flavors of Ubuntu VMs and my personal box, and it works for me. Please let me know what you think, and feel free to offer any suggestions for improvement.

Ask for Help

“We are getting 504 Gateway errors and we thing it is because our mongrels are freezing up do to inability to allocate memory, what to do?”

Without more info on the problem a few possibilities were suggested, such as the OS might be swap thrashing or the OS has no more memory to allocate.

One suggestion is to cut down your swap space to 0 in an attempt to verify that your mongrels are asking for too much, basically remove to OS swapping memory to disk from the equation.

Another suggestion is to boost your swap up to some insane size, also to take it out of the equation, the theory being that we know mongrel can leak memory, we trust the OS to keep the used memory in RAM, and we have plenty of disk space, so why put your OS in the position of not grating a mongrel what it is asking for.

Both solutions above don’t seem ideal but, whatever, we are pragmatists, and if we combine those with periodic monitoring of the system using top/ps/vmstat, at least your mongrel can keep running and this may give you time to figure out why mongrel may be so memory hungry