waxmop has asked for the
wisdom of the Perl Monks concerning the following question:

Greetings wise ones:

I just wrote a clunky application, and I want to get the
community's feedback, because I don't like how I wrote it.

Background: the people I work for need a way to search, add, and
edit their records of contacts at a bunch of different companies. For
example, when somebody gets a new phone number, we need to be able to
update our records.

Here's the overview of what I did:

Turned the excel spreadsheet that held all the orders into an xml
file.

Wrote a search.html page that gets information from a form, and
then uses perl-mason and XML::Simple to find matching records and
print them.

For any printed record, a new form appears that allows the user to
fill out updated information. Then, the perl-mason code replaces the
old record with the new one. This is the part that seems clunky to me.

I wrote some perl-mason pages to allow users to search records after parsing the data with XML::Simple. They can edit a record by filling out a form on the post_edits.html page. Here's what happens on the post_edits.html page:

I pass the new edited record in the %ARGS hash. Then I create the $newnode structure which will replace the old record. Then I loop through my in-memory structure until I find the record that I want to replace, and then I replace it, with the $rec = $newnode line. I've used Data::Dumper to check that it works, and it looks good, but I can't help but wonder if there is a better way.

Why not use a simple database? The schema is already defined within the excel spreadsheet, and you'd get away from data integrity errors (your solution is not multi-user safe). And you would still be able to create the spreadsheet whenever necessary using something like Spreadsheet::WriteExcel. You already seem to be doing much of the work already; using a db would allow SQL to more efficiently update just the row that needed to be instead of chugging through the entire XML file each and every time.

XML is of course very modern, hip and funky to boot, but I wonder if it is not a bit of an overkill here.

As your data seems to be very regular, I would go either for a simple CSV-file (easy to extract from an excel-file) and DBD::CSV or go direct to the excel-file itself (if it is in an acceptable format, i.e. worksheet = TABLE and the first row contains the columnheadings) and use DBD::Excel.

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

While I agree with some of your thoughts, I personally try to avoid CSV files because the definition of "CSV" is so elastic in practice. How do you quote or escape field containing commas and quotes? How do you handle trailing empty fields? I've seen a lot of variation in stuff that is ostensibly "CSV", and that always makes me nervous from a long-term maintainability and interoperability standpoint.

That said, I do agree delimited files can make a lot of sense for lightweight data storage. For information that normally wouldn't contain internal whitespace other than "regular" (ASCII 32) spaces -- and this application might qualify -- I often choose tab-delimited.

One potential benefit I do see to using XML in this application is that you can easily store data that isn't quite so regular. For example, if you wanted to support multiple contact phone numbers, it would be fairly easy to expand the data structure like this:

There's an anti-buzzword backlash against XML floating around, which is a natural reaction to something that's been hyped so much by marketers.

But after all the smoke clears, it's pretty hard to beat the simplicity of using XML::Simple's XMLin() and XMLout() functions.

CSV files combined with while (<IN>) {...} logic has old-school appeal, but in the end, it's irritating and tedious. There's so many exceptions to be handled, like data that splits over lines, or data that contains the delimiter as part of the data, etc, and it's not as flexible as XML when you have to add new variables.

As others have mentioned, XML might not be the most straightforward or efficient solution to this problem -- but since you've already got working code here's a thought as to how you could evolve it. Check out some of the options in XML::Simple that control the way XMLin folds the generated data structure. Using keyattr, you ought to be able to set things up so that the data structure ends up with UID as a hash key. That would let you remove the loop that searches for the matching UID, something simple like this (untested, and dependent on your exact data structure):

Also, I would tend to avoid the four-digit zero-padded UIDs in favor of simple integers, which are less work to generate and also don't cause potential problems when you add your 10,000th contact -- though I hope that if you do have 10,000 contacts, you'd have begun using a real database rather than an XML file at that point :-)

If you play with the attributes on XMLin() and XMLout() your can avoid needing to search by forcing it to use hashes instead of arrays.

Using keeproot=>1, keyattr=>'uid' on both the XMLin() and XMLout() calls and adding noattr=>1 on the XMLout() you can pursuade XML::Simple to write the data back in the same form as it received it.

If you can't use a DB of some form, and you expect to have multiple concurrent users, then you could move the access of the data into a seperate process, maintaining a copy in memory and have it serve details to, and receive updates from the CGI process via a socket or pipe. This would probably need to be multi-threaded/forked but you might get away with having the CGI try to connect and then backoff for a short period and retry if it doesn't get access first time, if the volumes of traffic are low.

..and remember there are a lot of things monks are supposed to be but lazy is not one of them

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

The two control characters ^D and ^Z, and the tokens
__END__ and __DATA__ may be used to indicate the logical
end of the script before the actual end of file. Any
following text is ignored.

Text after __DATA__ but may be read via the filehandle
"PACKNAME::DATA", where "PACKNAME" is the package that was
current when the __DATA__ token was encountered. The
filehandle is left open pointing to the contents after
__DATA__. It is the program's responsibility to "close
DATA" when it is done reading from it. For compatibility
with older scripts written before __DATA__ was introduced,
__END__ behaves like __DATA__ in the toplevel script (but
not in files loaded with "require" or "do") and leaves the
remaining contents of the file accessible via
"main::DATA".

See SelfLoader for more description of __DATA__, and an
example of its use. Note that you cannot read from the
DATA filehandle in a BEGIN block: the BEGIN block is
executed as soon as it is seen (during compilation), at
which point the corresponding __DATA__ (or __END__) token
has not yet been seen.

I can't find a reference to __DATA__ in the docs. I know its there somehere. Maybe someone else will post one.

Essentially, you can use <DATA> as a file handle that you don't need to open to access anything after the __DATA__ marker at the end of your source file, __END__ works too, but has caveats when used with modules I was informed recently.

It's very useful for testing and demo purposes. You can even have multiple embedded and even writable files using Damian Conway's Inline::Files, I don't think they would be useful for your purposes here though.

..and remember there are a lot of things monks are supposed to be but lazy is not one of them

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.