August 2, 2010

Joins

Here’s an important thought: all three join methods are nested loop joins with different startup costs.

Discuss.

And while I’m asking questions: what’s the perfect tense of the verb “troubleshoot”, as in :

I am trouble-shooting a problem

I was trouble-shooting a problem

… ?

Update 9th Aug: By a fairly convincing lead English grammar comes out as more interesting than Oracle technologies – so I’ll write up joins in my next note (or three). In the meantime, we have a new question to ask about trouble-shooting, inspired by Bix (note 11): is the verb transitive or intransitive, viz: do we simply trouble-shoot, or do we trouble-shoot an object ?

Linguistic constructs tend to go through three phases, of course: (1) nobody says that, (2) nobody with a decent education would say that, (3) everybody says it expect for a couple of old fogies. I reserve the right, therefore, to be opinionated and wrong about the use of the verb “trouble-shooting”.

Opinion 1: the verb is intranstive – you do not troubleshoot a problem, although you may do trouble-shooting on a system.

Opinion 2: there is no past participle (despite Nigel’s perfectly reasonable suggestion (note 1) of “trouble-shot”) for trouble-shooting. Consequently the only past tenses for trouble-shooting are:

past progressive – I was troubleshooting

present perfect progressive – I have been trouble-shooting

past perfect progressive – I had been trouble-shooting

Trouble-shooting is a “continuous” action, even when it takes (took) place in the past, so has no “simple past” tenses. (Alternatively, you can compare troubleshooting with those extreme irregular Latin verbs whose past participle looks nothing like their present participle, hence the simple past for “I am troubleshooting” is “I have (not) fixed it”.)

nested loop join: applies the “where condition” on the outer table. For each row of the result set, oracle searches the inner table by the join condition, using a FTS or index;

hash join: applies a hash algorithm on the columns of the join condition (of the smaller table). then does the same on the bigger table and execute a… nested loop comparing the hash keys. the initial cost is the hash algorithm;

merge join: you order both tables by the join columns… compare the top values from both tables… you take the higher one out… if they are equal, it’s a match… i guess it would be like a nested loop with a stop condition.

Any operation on a potentially infinite set must employ a loop (or recursion, but let’s not go there). A join is an operation on two potentially infinite sets where each element of one of the sets must be searched for in the other set, so you need to process a potentially infinite set for every member of another potentially infinite set. Hence, the corresponding loops must be nested. Q.E.D.
Disclaimer: I might be talking complete rubbish (wow, it’s been 20 years already since that course).

By definition – ‘JOIN’ means – to ‘look up’ a value (from set A ) in Set B.

the basic task is to ‘look up’ which is a task basically in all 3 types of JOINS – they way Oracle ‘organise’ and ‘arrange’ the tables before starting look-up is different – that is all about ‘start up’ cost

Opinion 1 is closest to my own, and therefore the most likely to be correct ;) “Troubleshooting a problem” = “Troubleshooting”, or “Problem-shooting” if you will. It’s the sort of vague thing you say you’re doing when you want someone to just go away reassured that you’re doing “something” about a problem. It could mean so many things that in the end it means almost nothing.

Actually there are at least four kinds of joins used by Oracle: Merge, Hash, Nested Loops and Cluster Joins.

The latter are used when you join two tables that are stored in the same data blocks (clustering). When you join them in a way that each of the rows being joined share the same data block, Oracle can easily join them together without much additional efford.

I haven’t tested this, but I guess in the execution plan the cluster join will show up as a nested loop join with a cluster access path on all but one of the tables accessed.

Stefan,
Thanks for your suggestions. I was just a little careful to use the term “join mechanism” in my original statement – the cluster join isn’t an extra join mechanism: in fact, as you suggest in your final sentence, the commonest mechanism used in a cluster join is the nested loop. (Merge joins – cartesian, inevitably – can also appear).

The two tables are in the same index cluster the ID column is the cluster key and the join is on the cluster key. Note how we have a special access method (table access cluster) but, as you assumed, a standard join mechanism (nested loop).

[…] This is part one of my thesis that “all joins are nested loop joins – it’s just the startup overheads that vary”; there will be a note on “Joins – HJ” and “Joins – MJ” to follow. (For a quick reference list of URLs to all three articles in turn, see: Joins.) […]