January 2008 Archives

Oh no, hourly smoke test failures in my inbox today! Looks like I put some bad HTML on work's website's home page, and every hour one of the automated tests, via Test::HTML::Lint, told me once an hour that

# HTML::Lint errors for http://devserver.example.com/
# (72:53) <a> at (61:53) is never closed

Well, phooey, it's a PHP-driven website, so I can't open index.php and check lines 72 and 61. I can use GET, installed with LWP, to fetch the website and save the source:

$ GET http://devserver.example.com/ > foo.html
$ vim foo.html +61

but since I'm a Perl programmer, I want to be as lazy as possible by using the tools at my disposal. In this case, it's ack, and ack has the --line option to display ranges of lines instead of results of a regex. (Thanks to Torsten Blix for implementing this!)

$ GET http://devserver.example.com/ | ack --lines=61-72

So much nicer that way!
and look at it in an editor, but how much easier to not have to do that.

There is no project so small, so trivial, that it is not worth you putting it into a Subversion repository. If it's worth your time to work on it, it's worth saving. Putting it in Subversion is a matter of a few statements, and you don't have to do any big fancy-shmancy server setup.

Let's assume you're working on Linux/Unix, and you have svn installed, which is pretty standard these days. Say you're working on a game called bongo, and you've just been keeping it in ~/bongo. Do this:

You probably wouldn't want to do this in production code, but like the best of Damian Conway's not-useful-but-thought-provoking modules, it may spark some ideas that you can apply to more useful situations. If nothing else, the source is a fine lesson in overloading and method importing.

Just read those lines of code and you can recreate the crime in your head. First there was a customer in Foo County. Then, we had to handle a different Foo County, but this Foo County was in Texas. He couldn't even be bothered to change the initial test to be more specific, or to modify the existing code. His solution was the simplest thing that could possibly work, and was also the worst: Reversing the effect of the first check for Foo County. There's also no checking for non-Texas, non-original Foo County, but when I checked I found that we have customers that are in Foo County in THREE different states.

The programmer no longer works for us, so I'm unable to ask him about his motivations. I'm fascinated by the mindset that is unable to do the barest rework necessary.

Ricardo Signes' marvelous module CPAN::Mini just got an update today, and it reminds me to tell you all how great it is to be able to have a small version of the CPAN on your local hard drive, especially on a laptop. The included minicpan program makes it trivial to update your local archive.

Max Kanat-Alexander has a new blog up called Code Simplicity, and I'd love it for the name alone. His latest post, "Designing Too Far Into The Future", talks about the perils of trying to predict the future and guess what your code will have to do down the road. In the XP world, the term that gets thrown around is YAGNI, for "Ya Ain't Gonna Need It." When you have to write a report and you start by writing a report generator, that's a big violation of the principle of YAGNI.

I see this missed so many times I have to bring it up here: "If you have a database column that contains only digits, but will not perform calculations on it, make it a character column."

You CAN store a 10-digit phone number as an integer, but why would you want to? You CAN store a Social Security Number as a 9-digit number, but why would you want to? Surely you're not so concerned of a few bytes savings. Storing an SSN of "0123456789" as a number means you use the leading zero, too, so you lose fidelity of data. Any string of digits follows this rule. You don't perform calculations on part numbers, course numbers, Dewey Decimal numbers, or house numbers, either, so make 'em all character fields.

Same goes for years stored as date datatypes. If you're recording the year that a movie was released, then there's no advantage to having it as a date. Store it as an integer to make it simple to take differences ("How long after Citizen Kane did ET come out?") or comparisons.

Most of all, keep things consistent. If you've got a 10-character column in one table, and an integer in another, then SQL joins will be very expensive, even if both columns are indexed.

Working on my Big Dirty PHP Project at work, I've found this bit of code in many places.

$categories = "";
$categories = Array();

Why is $categories set to an empty string, and then an array? It's not necessary to pre-initialize a variable before setting it to another value. So why is the code there? It's not just one case. It's throughout the codebase, where I delete the first line whenever I find it.

The original programmer is (thankfully) no longer around to ask, but I'm guessing it's superstition. Perhaps he had some problem that went away for an unrelated reason when he added the first line of the code. The problem is that he never considered why.

Here's another coding horror to avoid in Perl. Ever seen a regular expression by someone who wasn't entirely familiar with regexes and quoted everything whether it needed it or not?

if ( $name =~ /Marcus Holland\-Moritz/ )

The hyphen in Marcus's name isn't a metacharacter, but the unsure, superstitious programmer will quote it anyway. "Eh, it doesn't hurt anything," he may reply, but it also demonstrates his non-mastery of regexes.

If you ever find a piece of your code where you can't understand exactly why it works, why every single statement exists, stop and rework it until you do.

Why love say so much? It's just the same as print
with a "\n" at the end, right? Yup, but that "\n"
causes heartache for me in life. See, I've been working on removing
interpolation from my life wherever possible. For instance,
we've probably all seen beginners do something like:

some_func( "$str" );

where the quotes around $str are unnecessary. (Yes, I know
there could be overloaded stringification, but I'm ignoring that
possibility here.) That function call should be done as:

some_func( $str );

By the same token, I don't use double quotes any more than necessary. Rather than creating a string as:

my $x = "Bloofga";

do it as

my $x = 'Bloofga';

It's not about speedups. It's about not making the code do anything more than it has to,
so that the next programmer does not have to ask
"why is this work getting done?" If the code doesn't need the double-quoting, then don't use the double-quoting.

I started down this road when I read this rule in Perl Best
Practices, but ignored it. "Eh, no biggie," I thought. Then
I started using Perl::Critic,
and it complained about everywhere I was using double quotes. As
I examined those complaints, I came around to realize that if you're
having the computer do work, the next programmer has to wonder why.

So now we get the say command, and I get to eliminate at least 50% of my necessary string interpolation. Instead of:

print "Your results are:\n";

I can now use:

say 'Your results are:';

So much cleaner. In a color-coding editor like vim, the distinction is even clearer, and as
MJD likes to point out, "It's easier to see than to think."

Start using say. Even if you're not on 5.10 yet, you can use Perl6::Say for most of the places that say works in
5.10. Even better, stop using unnecessary interpolation altogether.