pissflaps has asked for the
wisdom of the Perl Monks concerning the following question:

Hello monks! I was placed in charge of a ticket-alert system written and perl and cannot get past half of this code.
I have been trying to split a string of lines into variables representing each delimited word within the line. If that is unclear, maybe a visual representation will help:
A flat-file DB system is sitting on an HTML page.Tickets are formatted like such:
(All on one line)

which was generated from splitting the input values for creating the tickets. I assign the input into an array and regex off the markup and spaces.
I'm now left with something like this:

371540|4/07/2011|08:03|11:03|2|Company Name (MAIN SITE)|DB PURGE|
and want to assign variables to each word in this format:
$ticket,$DateAdded,$STime,$ETime,$Pri,$SiteName,$Comments

This way, I can access the variables and email alerts based on the time variables to be compared to the current time.
Where I'm having trouble seems to be around the following segment of code. Any help or advice would be GREATLY appreciated, since I am very new to Perl and have been debugging this script line-by-line with warnings and just can't figure out some of the functions I'm applying.

You can see that I'm able to split a single line into the variables, but I want to iterate over every line in the $line string to place these variables onto the data.
Am I totally setting myself up for failure, or is there a better way to do this?

You print $line->[0] which is quite different from $line[0]. How this prints "371225..." is beyond me.

You also do lots of strange things to my eye, but that is to be expected ;-). For example to copy @arr into @line (for which you used a loop) "@arr= @line;" would have been enough. The same goes for "my @lines= ($line[0]..."

Also using lots of variables called $line, $lines, @line, @lines makes any program into an entry into the obfuscation contest. Differentiate your variables better

Note @ticker and $i are only neccessary if you want to operate on the data after the loop. If you just want to print or store the stuff, do that in the foreach loop above. You can also just use the split line you have in your script to fill the variables with the data inside the loop instead of using @items if it suits you better

This is really close to where I was going. I can't seem to initialize {$names{$_}} due to the curly braces. Is this assigning everything into a hash table? If so, I'm receiving an error about either $names or $items that it isn't initialized, which is very similar to the errors my original code pulled. Is there maybe a different notation to assign everything into a hash while still initializing the variables?

{$names{$_}} on the other hand is not a really useful expression in perl, %{$names{$_}} refers to a hash whose reference is stored as value in another hash (i.e %names). A multidimensional hash or HashOfHashes in perlspeak

Personally, I am a very big fan of using high-level tools such as HTML::Parser to do as much “heavy lifting” as possible.

My rationale is that: any HTML document does have a known structure (even if it is not obliged to adhere to it strictly, in actual practice), and that, “anytime you are dealing with a complex document having any known structure, the best way to deal with such a thing is to use a parser.”

There are many, many good parsing engines in Perl. One that I recently had the privilege of beating to a bloody pulp (wink... it proceeded to do everything I asked it to, and more!!) was Parse::RecDescent. (I am still in awe of its author!) But in this case, the source-language is “simply HTML,” and HTML-specific tools abound.

All parsing engines are, so to speak, “engines that are really, really good at character-twiddling and which know the lay of the land.” You rely upon them to go about their business and to call your code at strategic points, and to return data structures to you at those times.

This is, IMHO, a much stronger strategy than “regex hell,” which often yields solutions that work fine in initial test-cases but then require constant twiddling and head-banging. Let the CPAN-authors do as much banging on your behalf as possible. It will not, of course, eliminate the considerable amount of work that still remains to be done, but it might well make that work vastly easier.

I've got to second the vote for HTML::Parser or similar parsing engines.

A long time ago, before RSS feeds, I wrote a program to parse various newspaper websites and did the regexes by hand. I had 24 different rules for 90+ papers. When I rewrote it, I got it down to 9 rules, mainly based on web page design, since I used a parsing engine.

You're going to save yourself a ton of work since if the data changes you're going to have to rewrite your regexes each time.

Thank you for such an informative response! I'll be sure to look into more about Parse::RecDescent, but for now that may be too daunting to pick up for a novice. Is there an example using HTML::Parser you could describe for using in this situation? I'm unfamiliar with basically any module outside of CGI. :(

Thank you! This is very close to where I was originally intending to head with the script. I cannot surmise how to access each element to have the variable assigned to it, though.

When I run this segment, I fail initializing $field within the split. I'm not sure how I got this error because everything is localized in the loop, right?
Would I access these variables with $ticket[0], $ticket[1]? I could easily adapt the rest of the script if this is the case.

I've since somewhat resolved the issues I was having and resorted learning all about making an array of arrays while in a for loop. Thank you all for your magnificent help and resources! I couldn't have figured any of this out without you.