Effective Perl by Joseph N. Hall

Observations and Tips from the author of Effective Perl Programming

Thursday, October 11, 2007

Your Phone Screen Homework Question

Before or after the phone screen, depending on how things work out, you may be asked to complete a short programming exercise. It won't be something with a lengthy answer but it will require you to demonstrate an understanding of a few key Perl skills. Here is the one that I have used recently. It is a kind of "beginning intermediate" question about a type of problem that comes up reasonably often in serious Perl programming and that has a distinctive Perl solution. Ordinarily you would be allowed access to (and encouraged to use) the Internet and/or other references if necessary.

The problem:

Create a hash that represents a set of items, and another hash that represents another set of items.

Compute a hash that represents the union of both sets.

Compute a hash that represents the intersection of both sets.

Print out the contents of the union and intersection sets sorted in alphabetic order.

Wednesday, October 10, 2007

Preparing for the Phone Screen, Part 1

It has been a while since I last blogged here! Let's put that aside for a bit, though, in favor of a general-interest topic.

Nowadays I am happily working at VMware. I will probably be doing interviews and phone screens from time to time, and if you are being considered for a position in my team, I will be the "Perl Person" you have to speak with. If you are doing a little research ahead of time, I have a treat for you, because here are some of the questions I am going to ask. If I have space and/or energy, I will eventually post all the questions I will ask, but that will have to wait for later.

These questions aren't too complicated, but I don't necessarily expect you to get them all correct. In fact, you can get a number of them incorrect as long as, during the process, you convince me that you know which ones you don't know, and you know how to find the answer quickly.

Introduction

When have you used Perl? What version(s)? What platform(s)?

What Perl book(s) have you read? What class(es) have you taken? Have you taught yourself? How much? What about?

Basics

What is strict? What is warnings?

Name a special variable. One other than $_. What is $_?

What is perldoc?

What is a Perl module?

What is the difference between print and printf?

What does local mean? Where could you use it?

How do you open a file? What is the return value from open? What is $!?

What is "null" in Perl? How do you tell if something is "null"?

How do you find the size of an array?

How do you write an infinite loop?

What is a regular expression that matches any character?

How do you remove a key-value pair from a hash?

What is a "pointer" in Perl? What kinds are there? How do you make one? What can you do with them?

a* # repetition - the character a repeated zero or more timesb+ # repetition - the character b repeated one or more timesx{1,3} # repetition - the character x repeated one to three timesabc # sequence - the character a, then the character b, then the character ca|b|c # alternation - the character a, or the character b, or the character c

It's important to understand precedence in regular expressions. For example:

abc{3}

means the characters 'ab' followed by three instances of the character 'c'. When I see something like abc{3} I usually think that the author really meant "three instances of the characters 'abc'" - which is written differently:

(abc){3}

As you can see, you can use parentheses to control the order in which the bits of a regular expression are interpreted. I like to make an analogy to mathematical (algebraic) expressions. Even though a regular expression isn't a mathematical expression, the syntax is at least somewhat similar, especially where precedence is concerned. From the standpoint of precedence, you can think of a{3} as being something like x10 - exponentation, the highest-precedence operation in algebraic notation. abc is like xyz (the variables x, y, and z multiplied together) - multiplication having intermediate precedence - and a|b|c is like x + y + z - addition having low precedence. This becomes useful when you try to figure out things like:

a|b|c # the character a, or the character b, or the character ca|b|c{2} # the character a, the character b, or two c's in a row# like a + b + c2(a|b|c){2} # one of a or b or c followed by one of a or b or c# like (a + b + c)2(a|b|c)+ # one or more a or b or c(abc)+ # abc one or more times in a row (abc, abcabc, abcabcabc, etc.)

So, think:

Repetition: exponentiation (highest)

Sequence: multiplication (middle)

Alternation: addition (lowest)

Now, the usefulness of all this depends on arithmetic (or algebra) being easy, which may be something else altogether.

Monday, January 30, 2006

Tip: Creating a String of Random Hex Digits

Every once in a while a string of random hex digits comes in handy*, perhaps when you need a random alphanumeric name for something. The unpack operator seems like it should come in handy for this purpose, given that it can translate character values into hex strings, and so it does. Here's a snippet that creates a string of 32 random hex digits:

my $rand_hex = join "", map { unpack "H*", chr(rand(256)) } 1..16;

rand(256) produces floating-point numbers 0.0 <= n < 256.0. unpack "H2" turns a character (a single-character string to be precise) into a 2-character MSB-first hex string - the chr is needed to turn the numeric value from rand into a string. The map { ... } 1..16 creates a list of 16 different 2-character strings, and join gloms them all together.

Tip: Reading a Few Lines from a File

This reads in 10 lines, one at a time, and appends them in order to the array@lines. Another perhaps cleverer way might be:

my @lines = map <STDIN>, 1..10;

I say "might" because this doesn't actually work. It looks as if you are using the ten numbers 1 through 10 to read 10 lines from standard input, which seems reasonable at first, because the .. ("dot-dot") operator returns a list of numbers when used in a list context. But the left hand expression you pass to map is also evaluated in a list context, and so the <STDIN> operator returns a list of all the remaining lines in the file. In other words, you ask for 10 lines but you get them all. You need to force a scalar context on the first argument in order to make <STDIN> to return one line at a time:

my @lines = map scalar(<STDIN>), 1..10;

Should there be fewer than 10 lines in your input, the array @lines will be padded with enough undefs to make up the difference because your attempts to read past end of file will return undef.

Wednesday, January 18, 2006

Plans for a New Edition of Effective Perl Programming?

I've been really pleased with all the great reviews that Effective Perl Programming has received over the years. The good sales are nice too - I'm still getting reasonable royalty checks.

I've been thinking of writing another edition for some time. At present I'm negotiating to do just that. Hopefully my work can be done by the end of the year, or maybe a little sooner.

Once I get going, I'll be posting bits of work in progress here. I will also, at some point, post the entire PDF of the current version. I may also ask for ideas about what to include. I think the rewrite will be somewhat different in its focus than the original.

Ugly. This is going to take all day with all those calls to substr.* You could split the string. There's a special split pattern // that divides the input string into characters. So perhaps:

my $n = 0;for (split //, $s) { $n++ if $_ eq 'e';}

Maybe this doesn't seem quite right to you yet. There must be a shorter way to do this in Perl, right? Yes. The tr/// (transliterate characters) operator can count characters for you. The tr/// operator returns the number of characters it changed. For example:

The value returned from tr/// above is 9, which is the number of characters changed to uppercase. Armed with this knowledge, you might think of:

my $c = $example =~ tr/e/e/; # change 'e' to 'e' ...

which will work just fine. This is perfectly reasonable, but there's a shortcut. If you omit the second argument to tr///, no characters are changed, but tr/// still returns a count of the character(s) in its first argument.