ChOas has asked for the
wisdom of the Perl Monks concerning the following question:

A collegue of mine is working on parsing a file, which
is part of a database... The person that wrote this
database has kinda screwed up, because he/she used 1 field
for 3 values, I`ll explain:

In The Netherlands, a surname CAN be prepended by (amongst
others) one of the following:
'VAN DER','VAN DE','DEN','DE','VAN' ...
I have no clue what this is called in English, but
I would like to call it a 'prependition' :)

Anyways... The fields in every line are:

NAME<Mandatory> PREPENDITION<maybe> HAVENT_GOT_A_CLUE<maybe>

I wanted to help, and I`ve tried different ways to parse
this, and I only have a little sample data, but I came up with this:

Can names have embedded spaces? Can the "haven't got a clue" part have embedded spaces?

Without knowing this, we can't know whether to break
MOTEL GOLDEN LEEUW <A225>
into
(MOTEL)()(GOLDEN LEEUW <A225>)
or
(MOTEL GOLDEN LEEUW)()(<A225>)
Assuming for a moment that the latter is the correct way to divide the field, you could do something like

Is "MOTEL GOLDEN LEEUW" a valid name?
That's the name of a motel, surely. What's weird about that?

If all this data is just in the same field, I think my pseudocode would be:

while (<DATA>){
$name = everything up to the first space.
$rest = everything else
if ($rest starts with one of the "van"-type prepositions){
deal with it
} else {
print "can't parse this one into a name: $_"
}
}

If NAME and HAVENT_GOT_A_CLUE can both contain spaces, then the instance where there is no 'prependition' will cause you real problems IF the last field can start with a letter. But if it can't, here's what I'd do:

You will not be able to parse it that easily.
Consider ALICE, born VAN DER VLIET, now married to
Mr. MARIE. Her name would be ALICE MARIE VAN DER VLIET.
But surely, you do not want to split that into:
(ALICE MARIE) (VAN DER) (VLIET)?

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other