Improving Performance by Using a Cartesian Join

This example is based on a demonstration that I gave during a presentation last year. I did not go into great detail how the code worked, but I demonstrated that a carefully constructed Cartesian join is helpful and efficient for solutions to certain types of problems. Assume that you have a table named APPLICATION_LIST that lists all of the modules belonging to an application, another table named USER_LIST that lists each Oracle username that has access to the application, and a third table named USER_PROGRAM_PERMISSION that lists each username that is denied access to one of the application modules. The table construction may seem a little odd, but this is based on an actual example found in a commercial product. The goal is to produce a cross-tab style report that shows all users’ permissions to all of the application modules, and have that cross-tab report appear in Excel. The table definitions for our test tables look like this:

The first VBS script will not use a Cartesian Merge join – instead it will retrieve a list of all users and all application modules, and then probe the USER_PROGRAM_PERMISSION table once for each USER_ID. (IntentionalCartesian1-NoCartesian.VBS – save as IntentionalCartesian1-NoCartesian.VBS)

If you ignore the fact that the above script redefines the meaning of the strUsername variable, the script works. The problem with the script is that it repeatedly sends queries to the database, and probably should be optimized to remove the repeated queries (the number of repeated communication to the database server could have been much worse). Let’s take a look at version 2 of the script (IntentionalCartesian2-NoCartesian.VBS – save as IntentionalCartesian2-NoCartesian.VBS)

While the second version of the script is better than the first, we are still sending three SQL statements to the server. We can improve that with a Cartesian join. Let’s take a look at version 3 of the script (IntentionalCartesian3-Cartesian.VBS – save as IntentionalCartesian3-Cartesian.VBS)

Notice in the above that the client-side code is much smaller, and we have collapsed the three SQL statements into a single SQL statement with the help of a Cartesian join between the APPLICATION_LIST and USER_LIST tables. The end result looks like this:

Actions

Information

4 responses

I kept finding them in queries over the years. They were very slow to finish, produced way too many rows, and strange, inaccurate results.

What was odd was that while it was blantantly obvious to me, my coworkers didn’t see it as a cartesian product. They kept barking up the statistics
and index trees. But actually, those can’t speed them up at all.

Yes, there are valid uses for Cartesian products – they are rare. In this example I used one to efficiently find all possible combinations of two row sources.

Other ways Cartesian joins may appear, even if the developer has written the SQL statement correctly:
* The query optimizer using transitive closure to generate additional predicates, and in the process the optimizer automatically removed the join condition between those two tables. Example 1 from the Cost-Based Oracle Fundamentals book Example 2 from that author’s website.
* The optimizer incorrectly calculates the row source cardinality of at least one of the row sources as a very small number, such as 1, and determines that a Cartesian join between two possibly unrelated row sources is safe and efficient. Randolf Geist and I included an example of this behavior in chapter 9 of the Expert Oracle Practices book. In this case, all of the correct join conditions were provided in the query.

I am not sure that I would classify your #3 and #4 items as something that would necessarily cause a Cartesian join. I think that those examples could be considered a legitimate 1 to many type join. This, of course, is just my opinion, and I could be completely wrong.

One more use for Cartesian joins: stress testing the server admins or the database admins. :-)

[…] on your own) that leaving out where clauses is a bad thing. Well, leave it to Charles Hooper on his Oracle Notes to prove to us that sometimes, they are actually more efficient for solving certain types of […]

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: