One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

Since my colleague didn't know Perl, he asked me for a critique of some code written (off-site) by that potential hire, so I had a look and told him my concerns (the main one was that it originally had no comments and it's not like we gave them enough time).

However, the code works so I'm loathe to say no-go without some more input. Another concern is that this code basically looks exactly how I'd code it in C. It's been a while since I did Perl (and I didn't do a lot, I'm more a Python bod for quick scripts) but I seem to recall that it was a much more expressive language than what this guy used.

I'm looking for input from real Perl coders, and suggestions for how it could be improved (and why a Perl coder should know that method of improvement).

You can also wax lyrical about whether people who write one language in a totally different language should (or shouldn't be hired). I'm interested in your arguments but this question is primarily for a critique of the code.

The spec was to successfully process a CSV file as follows and output the individual fields:

This question came from our site for professional and enthusiast programmers.

1

@Brian The style of the "Perl" the programmer writes is C-like. Using substr()/while loops instead of a simple little regex, for example. Look at @JohnathanLeffler's example.
–
Sicarius NoctisAug 20 '11 at 0:32

26 Answers
26

I advise people to never hire Perl programmers, or C programmers, or Java programmers, and so on. Just hire good people. The programmers who I've hired to write Perl were also skilled in various other languages. I hired them because they were good programmers, and good programmers can deal with multiple languages.

Now, that code does look a lot like C, but I think it's fine Perl too. If you're hiring a good programmer, with a little Perl practice under his belt he'll catch up just fine. People are complaining about the lack of regexes, which would make things simpler in ancillary areas, but I wouldn't wish on anyone a regex solution on parsing that dirty CSV data. I wouldn't want to read it or maintain it.

I often find that the reverse problem is more troublesome: hire a good programmer who writes good Perl code, but the rest of the team only knows the basics of Perl and can't keep up. This has nothing to do with poor formatting or bad structure, just a level of skill with advanced topics (e.g. closures).

Things are getting a bit heated in this debate, so I think I should explain more about how I deal with this sort of thing. I don't see this as a regex / no-regex problem. I wouldn't have written the code the way the candidate did, but that doesn't really matter.

I write quite a bit of crappy code too. On the first pass, I'm usually thinking more about structure and process than syntax. I go back later to tighten that up. That doesn't mean that the candidate's code is any good, but for a first pass done in an interview I don't judge it too harshly. I don't know how much time he had to write it and so on, so I don't judge it based on something I would have had a long time to work on. Interview questions are always weird because you can't do what you'd really do for real work. I'd probably fail a question about writing a CSV parser too if I had to start from scratch and do it in 15 minutes. Indeed, I wasted more than that today being a total bonehead with some code.

I went to look at the code for Text::CSV_PP, the Pure Perl cousin to Text::CSV_XS. It uses regular expressions, but a lot of regular expressions that handle special cases, and in structure isn't that different from the code presented here. It's a lot of code, and it's complicated code that I hope I never have to look at again.

What I tend to disfavor are interview answers that only address the given input. That's almost always the wrong thing to do in the real world where you have to handle cases that you may not have discovered yet and you need the flexibility to deal with future issues. I find that missing from a lot of answers on Stackoverflow too. The thought process of the solution is more telling to me. People become skilled at a language more easily than they change how they think about things. I can teach people how to write better Perl, but I can't change their wetware for the most part. That comes from scars and experience.

Since I wasn't there to see the candidate code the solution or ask him follow-up questions, I won't speculate on why he wrote it the way he did. For some of the other solutions I've seen here, I could be equally harsh in an interview.

A career is a journey. I don't expect everyone to be a guru or to have the same experiences. If I write-off people because they don't know some trick or idiom, I'm not giving them the chance to continue their journey. The candidate's code won't win any prizes, but apparently it was enough to get him into the final three for consideration for an offer. The guy got up there and tried, did much better than a lot of code I've seen in my life, and that's good enough for me.

My last point contradicts nothing. A rock star doesn't mean they are effective communicators or mentors, or that they aren't assholes to people who they think are inferior. A good programmer isn't the same thing as a good team member.
–
brian d foyJun 9 '09 at 7:37

8

Brian: there's a problem here: namely, your opinion that the mess above is easier to maintain than a clear regular expression – just because it's ostensibly clean code. Clean code isn't everything: the above is 100% pure visual clutter that hides the semantics in a lot of syntax noise.
–
Konrad RudolphJun 9 '09 at 7:44

10

True CSV almost certainly can't be parsed by a regex that anyone would ever be able to understand after writing it. Although it is tempting to believe that every problem solved with Perl needs a big regex, there really is more than one way to do it. brian d foy (note no caps) is right on point here.
–
RBerteigJun 9 '09 at 9:23

16

Konrad, you're just wrong. Properly parsing CSV requires keeping track of the parse state as you go (am I in a quoted field, did I just see an escape). That sort of stuff makes for messy or (near-)impossible regexes. I say near-impossible because Perl does in fact have features for maintaining state inside regexes. Nonetheless, the proper way to parse CSV in Perl is to do it a character at a time. That said, I've asked similar questions in an interview, and by far the best answer is "I would use Text::CSV(_XS) from CPAN."
–
Dave RolskyJun 9 '09 at 15:02

21

Konrad: since regexes are so easy to use to parse CSV, why not just post one?
–
jrockwayJun 11 '09 at 23:42

The only correct answer for parsing CSV in Perl is to use a module. CSV is nasty, and it's easy to make little mistakes. Let someone else deal with it (they already have).
–
Dave RolskyJun 9 '09 at 15:04

13

This is imho the only correct answer. Wastes less time, and will actually work. And strangers reading your code wont die from sheer confusion when they have to debug your CSV parser later ( because they will ).
–
Kent FredricJun 10 '09 at 2:05

2

Anybody who claimed to know Perl well, should know about Text::CSV and regex's, that would be my only concern. Did he ask if he could use a library?
–
Chris Huang-LeaverOct 5 '09 at 14:39

5

Perhaps the exercise was esp. to implement it from scratch?
–
AlbertOct 26 '09 at 14:51

33

To be fair to the poor sucker who got this sprung on him at an interview. Was the test set in such a way that it was clear using modules was OK? Was the environment set up well enough that he could find and use modules like Text::cvs? (This is a common perl problem -- you get told perl is installed and then you find out that only the interpeter is installed, all the "standard" modules are missing and cpan can't get past the corporate firewall).
–
James AndersonDec 1 '09 at 5:52

I would argue writing C in Perl is a much better situation than writing Perl in C. As is often brought up on the SO podcast, understanding C is a virtue that not all developers (even some good ones) have nowadays. Hire them and buy a copy of Perl Best Practices for them and you will be set. After best practices a copy of Intermediate Perl and they could work out.

This chops off all leading spaces using a regex, without making the code iterate around the loop. A good deal of the rest of the code would benefit from carefully written regular expressions too. These are a characteristically Perl idiom; it is surprising to see that they are not being used.

If efficiency was the proclaimed concern (reason for not using regexes), then the questions should be "did you measure it" and "what sort of efficiency are you discussing - machine, or programmer"?

Working code counts. More or less idiomatic code is better.

Also, of course, there are modules Text::CSV and Text::CSV_XS that could be used to handle CSV parsing. It would be interesting to enquire whether they are aware of Perl modules.

There are also multiple notations for handling quotes within quoted fields. The code appears to assume that backslash-quote is appropriate; I believe Excel uses doubled up quotes:

"He said, ""Don't do it"", but they didn't listen"

This could be matched by:

$line =~ /^"([^"]|"")*"/;

With a bit of care, you could capture just the text between the enclosing quotes. You'd still have to post-process the captured text to remove the embedded doubled up quotes.

A non-quoted field would be matched by:

$line =~ /^([^,]*)(?:,|$)/;

This is enormously shorter than the looping and substringing shown.

Here's a version of the code, using the backslash-double quote escape mechanism used in the code in the question, that does the same job.

It's under 30 non-blank, non-comment lines, compared with about 70 in the original. The original version is bigger than it needs to be by some margin. And I've not gone out of my way to reduce this code to the minimum possible.

Well, it's under 30 now, but when you have to go back to add /x after the team review it will balloon again. Then, after /x makes it look messy, you move the regexes out of the way by putting them into scalars with qr//, you add a bit more. But, maybe you get some of that back when you use \G so you don't have to modify $line, but then nobody remembers how \G works. :)
–
brian d foyJun 9 '09 at 16:17

3

I'd be worried if someone made a 30-character regex extend over 30 lines with /x -- I'm sure it could be done, but it wouldn't be more readable (not at that extreme). But I agree - the compactness is a variable quantity (or quality).
–
Jonathan LefflerJun 9 '09 at 16:35

1

now make it work on CSV with line feeds in quoted fields, go on, dare you to. ;). And it has to work with 2G csv files.
–
Kent FredricJun 10 '09 at 2:03

1

Not using regexes - at that point, it becomes a task for one of the CSV modules.
–
Jonathan LefflerJun 10 '09 at 4:34

No use strict/use warnings, systematic use of substr instead of regexp, no use of modules. This is definitely not someone who has "very good Perl experience". At least not for real-life Perl projects. Like you, I suspect that it's probably a C programmer with a basic knowledge of Perl.

That doesn't mean that they can't learn, especially as there are other Perl people around. It does seem to mean that they overstated their qualification for the job though. A few more questions about how exactly they acquired that very good Perl experience would be in order.

I think the answer to #6 is (or was perceived to be) "no." The spec is so simple and pointless that I think this is a more advanced version of the FizzBuzz question. Personally, I would have submitted two versions: One that showed that I had the necessary knowledge to solve the problem myself (hand-rolled CSV parsing) and one that showed how I would really do it in a production environment (leveraging CPAN).
–
Michael CarmanJun 9 '09 at 14:16

5

you conclude too much from the "open file" comment. I often outline what I want to write by putting those sorts of comments in place first, then filling in the code. I get the steps down, then I code.
–
brian d foyJun 10 '09 at 0:04

Since the code in question could be expressed much more compact and maintainable in idiomatic Perl, you really need to pose the question how much time the candidate spend developing this solution and how much time would have been spent by someone halfway proficient using idiomatic Perl.

I think you'll find that this coding style may be a huge waste of time (and thus the company's money).

I don't argue that every Perl programmer needs to grok the language – that, unfortunately, would be far-fetched – but they should know enough to not spend ages re-implementing core language features in their code over and over again.

EDIT Looking at the code again, I've got to be more drastic: although the code looks very clean, it's actually horrible. Sorry. This isn't Perl. Do you know the saying “you can program Fortran in any language”? Yes, you can. But you shouldn't.

This is a case where you need to follow up with the programmer.
Ask him why he wrote it this way.

There may be a very good reason.. perhaps this needed to follow the same behavior as existing code and therefore he did a line by line translation on purpose for full compatability. If so, give him points for a decent explaination.

Or perhaps he doesn't know Perl, so he learned it that afternoon to answer the question. If so, give him points for fast and nimble learning skills.

The only disqualifying comment may be "I always program Perl this way. I don't understand that regexp stuff."

I'd say his code is an adequate solution. It works, doesn't it? And there's an advantage to maintainability by writing "longhand" instead of in as few characters of code as you can.

The motto of Perl is "There's More Than One Way To Do It." Perl doesn't really get on your case about coding style, as some languages do (I like Python too, but you've got to admit that people can get kind of snobbish when evaluating whether code is "pythonic" or not).

Re comment from @maartinus about not seeing the advantage.

Consider code that uses ?: (ternary expressions) for greater compactness, instead of the more verbose if/then/else. But then the app gets a new requirement to log changes made in one or both of the branches. It's easy to add a line to a code block when using if/then/else but if you use ternary expressions you have use tricks like comma-expressions with side effects, etc.

Every application has to anticipate that it will need to be modified. But if you write Perl code as compact as you can, this can actually make it harder to insert new logic into the code. You basically have to rip apart the delicate and compact Perl trickery, and reinvent new trickery to handle the old behavior and the new behavior, every time. By doing this you are likely to introduce new bugs.

I wouldn't accept the candidate. He or she isn't comfortable with Perl's idioms, which will result in suboptimal code, less work efficieny (all those unnecessary lines have to be written!) and a inablilty to read code written by an experienced Perl coder (who of course uses regexes etc. at large).

brian: If you hire someone, you must base your judgement on whatever information you have. I might be wrong, but it's better to dismiss a good programmer than to hire a bad one.
–
user281377Jun 9 '09 at 7:04

That should at least be written using a regular expression to remove leading white space. I like the answer from jrockway best, modules rock. Though I would have used regular expressions to do it, something like.

#!/usr/bin/perl -w
#
# $Id$
#
use strict;
open(FD, "< qq.in") || die "Failed to open file.";
while (my $line = <FD>) {
# Don't like chomp.
$line =~ s/(\r|\n)//g;
# ".*?[^\\\\]" = Match everything between quotations that doesn't end with
# an escaped quotation, match lazy so we will match the shortest possible.
# [^",]*? = Match strings that doesn't have any quotations.
# If we combine the two above we can match strings that contains quotations
# anywhere in the string (or doesn't contain quotations at all).
# Put them together and match lazy again so we can match white-spaces
# and don't include them in the result.
my $match_field = '\s*((".*?[^\\\\]"|[^",]*?)*)\s*';
if (not $line =~ /^$match_field,$match_field,$match_field,$match_field$/) {
die "Invalid line: $line";
}
# Put values in nice variables so we don't have to deal with cryptic $N
# (and can use $1 in replace).
my ($user_id, $name, $level, $numeric_id) = ($1, $3, $5, $7);
print "$line\n";
for my $field ($user_id, $name, $level, $numeric_id) {
# If the field starts with a quotation,
# strip everything after the first unescaped quotation.
$field =~ s/^"(.*?[^\\\\])".*/$1/g;
# Now fix all escaped variables (not only quotations).
$field =~ s/\\(.)/$1/g;
print " [$field]\n";
}
}
close FD;

From a lot of the posts it seems many are against using regexp, but looking at the sample data and the result I think regexp are good. Since the sample isn't a valid csv-file Text::CSV will break on 'gt," Turner, George " rubbish,user,1'. It will display: [gt] [" Turner] [George " rubbish] [user] [1] It should however strip the ' rubbish' text from the string. Go with regexp, live a little!
–
Johan SoderbergJun 9 '09 at 16:52

The fact that he hasn't used a single piece of regex in the code should make you ask him a lot of questions about why he did write like that.

Maybe he's Jamie Zawinski or a fan and he didn't want to have more problems?

I'm not necessarily saying that the whole parsing should be a huge amount of unreadable CSV parsing regex like ("([^"]*|"{2})*"(,|$))|"[^"]*"(,|$)|[^,]+(,|$)|(,) or one of the many similar regex around, but at least to traverse the lines or instead of using substring().

Not only does the code suggest that the candidate doesn't really know Perl, but all those lines that say $line = substr ($line,1) are dreadful in any language. Try parsing a long line (say a few thousand fields) using that type of approach and you will see why. It just goes to show the sort of problem that Joel Spolsky discussed in this post.

You have, literally, a guy that has written several books on programming Perl and is well known in the community, saying that this is acceptable code.. And then 90% of the rest of the posts saying how awful this guy is, and should be a no-hire.

Candidate was given a task, completed the task, submitted clean, working code...... and now people are ripping it because even though it works, they don't like the style and would've done it different.

5'll get you 10 that these guys saying this also work for/in a company that complains that they can't find programmers....

The crucial point here is - naturally after assuring that it works as expected - whether the code is maintainable.

Did you understand it?

Would you feel comfortable fixing a bug in it?

Perl programs have a tendency for looking like what a cat types by accident when walking on the keyboard. If this person knows how to write readable Perl code that fits to the team, this is actually a good thing.

Then again, you may want to teach him about regular expressions, but only carefully :-)

That's a good point, except for the fact that we are using Perl. This position is a backfill for someone who had to, shall we say, leave in a hurry. I may have confused you. I haven't done Perl for a while, but my colleagues group does use it.
–
paxdiabloJun 9 '09 at 7:05

Code looks clean and readable. For that size, it does not require that much comments (perhaps none at all.) It's not just about good comments, but also good code, and the later is more important than the former.

If we were looking at a more complex/larger piece of code, I would say that comments are needed. But for that (specially the way it was written - well written), I don't think so.

I think it is unfair and vain to put doubt on the applicant given the piece of code submitted by him/her is quite acceptable and did the job.

The most commonly used Perl modules have a C compilation. See DBI. See (YAML|*)::XS, see Template Toolkit. See libmemcached, see YAML::XS So, hell yes, if your developer can write Perl and C, and understand why, you'd be nuts not to hire them.

If you're concerned about the reliability, ask them to write comprehensive [unit] tests.

The issue is writing perl using C idioms rather than writing idiomatic perl. Writing if (($pastquote == 0) && (substr ($line,0,1) eq "\"")) { rather than if (not $pastquote and $line =~ m/^"/) { -- I've written C in Lisp, its not a good thing. Write perl. Write C. But don't write C in perl or perl in C.
–
MichaelTJan 23 '14 at 1:13