WHERE has changed to Numeric Comparison - SOLVED (or, at least a good work-around)

Can't Post

Old DBD:DBI:CSV program on a Linux host has been working for (15+?) years. Now need to port it to a newer AIX box.

The old (working) WHERE clause, . . . if it saw "5151" for a key, ONLY matched/updated "5151" in the master. But the latest CSV modules on the AIX host . . . "5151" matches "5151", but it ALSO (wrongly) matches "0005151".

Its as if the functionality of the WHERE comparison has changed from String/Textual to Numerical comparison.

I can't find (and I've looked hard) a way to "declare" (as string to force string comparison) or "quote" to force string comparison.

Has anyone found a way around this issue?

For an original/Master table like :

MasterTable 0005151,20150121085200,1000,0 5151,20150121085200,2000,0

An update entry like :

Update Entry 5151,20150121085200,22154,0

Should update ONLY the 5151 line in the master. (And on the old Linux app it does.) Like :

MasterTable 0005151,20150121085200,1000,0 5151,20150121085200,22154,0

But on the new AIX (newer DBD/DBI/CSV modules) . . . it updates both lines, like :

Which DBD module are you using? If it's not on cpan, then post a link to where we can find it.

Please post a short but complete script that demonstrates your problem so that we can test it and suggest how to fix it. And, post it within code tags so that the formatting (indentation) will be retained.

I don't have time right now to do full testing, but I do spot several problems from the start.

It's very poor perl coding practice to declare all of those vars at the beginning of the script. They should be declared in the smallest scopt that they require AND as close as possible to where they are first used.

Add this line to your list of use statements and fix each issue it points out.

Code

use warnings FATAL => qw(all);

The first issue will be:

Quote

"my" variable $StartDir masks earlier declaration in same scope at C:\test\jhumkey.pl line 9.

Then this one:

Quote

Use of uninitialized value $StartDir in concatenation (.) or string at C:\test\jhumkey.pl line 15.

You're right . . . but it doesn't apply here. In "the real" "much larger" program . . . its all "encapsulated" variables down in many more levels. There's a whole "recursion" layer to process multiple input files and apply them all to the Master, gutted from this simplified example. And there are MANY more fields on the real lines.

But. . . this smaller example runs, and fails, to show the issue.

And the big point to remember . . . it has WORKED for 15+ years (on the old Linux box). The question at hand, is why (the much larger version that I didn't upload) won't port over and do WHERE comparison's the same way on a newer AIX host.

--- To the other reply about "sprintf" . . . I saw some quote(q("static")) type things done in sprintf examples but . . . I'm not quoting fixed-static text, I'll be SELECT reading them into variables, then comparing the MasterFile column (SerialNumber) to the variable text in $SerialNumber. So . . . I wasn't sure what to change to make the WHERE work right with an sprintf.

I started with the same script, but had to change all the \"$SerialNumber\" type references to be '$SerialNumber'.

Something about the new host/modules . . . didn't like the double quotes any more. It would give errors like "Column 6313 not found". (Well, that's because 6313 is a VALUE, and you shouldn't be looking for it as a column . . .)

Anyway. I thought more about "sprintf".

Instead of having Select place values into variables, and performing the UPDATE directly from the contents of the variables (which worked fine on the old host/modules), I added an intermediate step. Select into variables, print a STATIC STRING with the variables expressed, then call the Update passing the static string.

The problem is that (internal to the WHERE, when its pulling rows from the file that I can't intercept and Data::Dump) . . . that's when it pulls the 0006313 from the file and falsely matches it to 6313 in $SerialNumber.

I could Data::Dumper $SerialNumber, but it is (and should be) 6313. I can't get to the row-by-row internal comparison value fetched from the file as part of the WHERE.

I'm pretty much to the point of saying . . . we're going to have to "regress" to some older CPAN modules . . . and hope we can find an older one that works. (Or . . . one that works right under AIX.)

If you have a large number of inserts/updates, then I would rearrange your sql statements and drop the "do" statements. It is far more efficient to prepare the statement once using placeholders (prior to the loop) and only have the execute statement inside the loop.

Before doing the head-to-head test, I would highly recommend profiling the script on the new system. The scaled down code you posted has a number of inefficiencies which will only get worse when scaled up.

OK, I still claim there is an underlying flaw, either the modules changed over time (or the AIX version was tweaked by IBM and is flawed) and where CSV "WHERE" clauses previously did String Comparisons, now they do Numeric Comparisons, with undesirable results.

However, the string length of "0006313" does not match the string length of "6313" even if ascii-to-integer conversion results in 6313 in both cases. So . . . your suggested addition of the LENGTH comparison of the key to the WHERE value being tested. works.

8512 rows in Master file, 1039 in Update file, head-to-head comparison old (12 year old Linux system runs in just over an hour) vs new (2 year old AIX system runs in 15 minutes) produces the same results.

So . . . even if the underlying problem exists (the comparison changed) the fix is solid.

As for efficiencies . . . As I'm sweeping through the input (new-changed values) file, I may encounter a new record that needs to be INSERTed into the Master, or (more often) encounter an existing record that needs to be UPDATEd into the Master.

I (now) understand I can "prepare" in advance, and only "execute" against value changes, and so not incur the "prepare" interpretation cost each time.

However, I'll be interweaving INSERTs with UPDATEs. So I'll need two "active" prepares. There doesn't seem to be a way to do that. Unless I open (connect) the file twice. If this were a real DB (Oracle/MySQL) performing the SQL execution in the background, I might've gone that route. But when its a Perl program updating a flat CSV file . . . I had fear two interior connects/opens wouldn't share-cooperate and update the flat file properly. I can test that.

Either way. Though its a valuable process, it need only run once a day. As long as it can finish in 23h:59s . . . 15minute run time should be fine.

This would probably never be an issue for a real DB in the background (Oracle/Postgress/MySQL) but for anyone else using CSV, maybe someone else will see this and have their fix too.

Another approach which would be much faster than what you're doing would be to not do this via sql statements. Instead, load the master file into a HoH (Hash of Hashes) and then loop over the update file and do a simple hash assignment. You would not need to do the test which decides if it's an update or insert because the hash assignment would do auto vivification which creates the new hash entry if it didn't already exist.

I had even considered eliminating the MasterFile altogether, and just going direct from update files-rows directly to the output desired, but, I need to output ONLY the lines that updated "today" (and I can't stop their sending ALL entries from the plants, even the ones that didn't change today, and they mark all the ones they send today, with the same datetime stamp, whether they changed or not, and they're sending bursts of updates throughout the day) so I need a "memory" of the old value . . . hence a MasterFile of some sort must remain, for me to pass on to the next process, only those that "really" updated today.

This is probably the "final resting place" of a 15 year legacy project. We had much more data (more individual plants sending updates) and the overall project did many more things originally. At home, sure, I'd install MySQL or Progress or . . . whatever I needed. Here at work . . . I'm a "user" on those particular boxes without root or install ability so . . . CSV files were something we could utilize without 1000 layers of permission.