So…. since I can’t write about mistakes other people make (unless they will give me their permission :)), I will write about a problem in production, I’ve created myself. The most remarkable thing is, that I’ve created it while I was in Greece.

The week before I left I’ve submitted a small change to to my signature search function. Since we started migration of the customers to the new system, we needed to make sure that this customer we found did not migrate. That meant I should append an additional condition to each generated search:

AND migration_flag IS NULL

Now let me tell you that my search is called “smart search” for a reason. Since we were striving for “google speed”, and since the search criteria could belong to different tables, I am generating different SQL statements based on what I suspect is a selection criteria. Which meant, that I had to add this

AND migration_flag IS NULL

in multiple places. And I did. And tested, and got the patch review. And went to Greece. And then on Wednesday, when I was (thankfully!) already done with my presentation, I’ve started to receive emails, that the search became slow. First I even could not remember, what I’ve done something with it – the change was soooo minor!

Now let me tell you, that the internet in the hotel at the conference was really bad, and I often could not connect at all. On top of it, our VPN was changing the same week. So I could barely send/receive emails, let along running some test, and I’ve started to question my teammates so that I could diagnose the problem “dry”.

Soon I’ve found out, that there was only one case when the problem manifested itself, and based on that I identify the problematic place in the code. And believe me or not, I found it!

The next challenge was to communicate with my coworkers and trying yo “explain in English”, what changes should be made. What to know, what the problem was? Here is it. The original condition was

c.id =(SELECT customer_id FROM loans WHERE id =’|| v_loan_id||’)

and when I added one more condition:

AND migration_flag IS NULL

… I put parenthesis in a wrong place!

That’s what I did:

c.id =(SELECT customer_id FROM loans WHERE id =’|| v_loan_id||’ AND migration_flag IS NULL)

And that’s how it should have been done:

c.id =(SELECT customer_id FROM loans WHERE id =’|| v_loan_id||’) AND migration_flag IS NULL

You know, why? Because the migration _flag belongs to the customers table, and the subselect is selecting from the loans table! And the optimizer didn’t know what to do! I could not believe I was so stupid!

Perhaps, the most exciting thing about this was, that after a while an optimizer “fixed itself”, and figured out the it should not be a cartesian product!