7 Answers
7

The expression you give is too greedy and thus consumes more characters than you wanted. The following expression works on your sample data set:

perl -l -ne '/:([^|]*)\|.*:([^|]*)\|/ and print($1,"\t",$2)'

It anchors the search with explicit matches for something between a ":" and "|" pair. If your data doesn't match exactly, it should ignore the input line, but I have not tested this. I.e., this regex assumes exactly two entries between ":" and "|" will exist per line.

Note that Moritz's answer may over-constrain your search if $1 or $2 could legally contain a ":" character. If a possible result might be Q9VDK:19, use my answer. Otherwise either should work.
–
UncleOpAug 28 '12 at 14:43

: Matches the start of the first "word".
([^|]*) Matches the desired part of the first "word".
\S* Matches the end of the first "word".
\s+ Matches the "word" separator.
[^:]*: Matches the start of the second "word".
([^|]*) Matches the desired part of the second "word".

This isn't the shortest answer (although it's close) because each part is quite independent of the others. This makes it more robust, less error-prone, and easier to maintain.

Why do you not want to use the split function. On the face of it this would be easily solved by writing

my @fields = map /:([^|]+)/, split

I am not sure how your regex is supposed to work. Using the /x modifier to allow non-significant whitespace it looks like this

/ : ([^|]*)? [^:]* : ([^|]*) /x

which finds a colon and optionally captures as many non-pipe characters as possible. Then skips over as many non-colon characters as possible to the next colon. Then captures zero asm many non-pipe characters as possible. Because all of your matches are greedy, any one of them is allowed to consume all of the rest of the string as long as the characters match the character class. Note that a ? that indicates an optional sequence will first of all match all that it can, and the option to skip the sequence will be taken only if the rest of the pattern cannot then be made to match

It is hard to judge from your examples the precise criteria for a field, but this code should do the trick. It finds sequences of characters that are neither a colon nor a pipe that are preceded by a colon and terminated by a pipe