And certainly result in the same query plans on every DBMS I've ever tried.

But every so often, I read or hear an opinion that one is definitely better
than the other. Naturally, these claims are never substantiated with an
explanation.

Where I work, the second version seems to be favored by the majority of other
devs, and so I also tend toward that style to minimize surprise. But In my
heart, I'm really thinking the first one (since that's how I originally learned
it).

Is one of these forms objectively better than the other? If not, what would be the reasons to use one over the other?

Why not profile it and let the rest of us know the outcome? Generally speaking, performance very much outweighs style preference.
–
Demian BrechtMay 22 '11 at 1:22

1

"result in the same query plans on every DBMS I've ever tried" If this could have an answer in terms of performance, it'd have asked it on stackoverflow.com. alas, they are the same query.
–
SingleNegationEliminationMay 22 '11 at 1:24

8 Answers
8

I find that the second form is better. That may be because that is how I learned it, I'll admit, but I do have one concrete reason - separation of concerns. Putting the fields you are using to join the tables in the where clause can lead to difficulties in understand queries.

In this case, anything having to do with the tables or how they relate is all isolated to the from clause, while the actual business logic for query restriction is in the where clause. I think that is just much more understandable, particularly for larger queries.

The join syntax replaced the old comma syntax in 1992. There is currently no reason to ever write code with the comma syntax. You gain nothing and you are subject to some problems you simply don't have with explicit syntax.

In the first place as you get more complicated queries is very easy to do an accidental cross join by missing a where condition. This is something the explicit join syntax can prevent from happening as you will get a syntax error.

If you intend a cross join, the explicit join syntax will make that clear while in the implicit syntax someone doing maintenance may assume you forgot to add the where clause.

Then there is the problem of left and right joins which are problematic in at least some dbs using the implicit syntax. They are deprecated in SQL Server and in fact do not return correct results realiably even in the older versions. No query that needs an outer join should contain the implicit syntax in SQL Server.

Further, I have seen questions here and on other sites where wrong results happened when people mix the implicit and explicit joins (when adding a left join for instance), so it is a poor idea to mix them.

Finally many people who use implicit joins don't actually understand joins. This is a critical understanding you must have to effectively query a database.

Ha. I just happened to find a possible answer to my own question, while looking at the documentation for PostgreSQL. To summarise what this page explains, the resulting query is still the same, but the number of plans the optimizer must consider grows exponentially with the number of joins.

After about six such joins, the number is so great that the time to plan the query may be noticeable, and after around ten, the optimizer will switch from an exhaustive search of plans to a probabilistic search, and may not arrive on the optimal plan.

By setting a run-time parameter, you can instruct the planner to treat explicitly mentioned inner and cross joins differently from implicit joins, forcing them to the top of the plan, and not exploring other options.

Of note, the default behaviour is the same in either case, and that getting alternative plans requires knowledge of the dbms' internals and the peculiarities of the tables in question to get a different result

You've slightly misunderstood those docs, however. Firstly, there actually are three thresholds. One fires the GEQO as you pointed out; the other two (from and join collapse limits) end up making the planer stick to picking applicable indexes rather than re-organizing the join order. Secondly and just as importantly, the queries are rewritten as they are parsed. This results in the first of the example queries getting parsed into the exact same query tree as that of the second one -- the thresholds then let PG know if it should try to re-order the joins or not.
–
Denis de BernardyMay 22 '11 at 16:07

When you use a comma to separate two (or more) table names what you are intending is the cartesian product. Every row of the 'left' table will be 'matched' (concatenated) with that of the right table.

Now if you write something in the where clause, it's like putting a condition on this 'concatenation' telling which rows to 'concatenate' with which rows.

This is actually "joining" the rows :) and hence the join keyword that helps provide a more readable syntax and is more understandable that you 'indeed' want to join on some common values. Similar to what @Dustin has clarified above.

Now, every DBMS is smart i.e., it doesn't calculate the cartesian product first and then filter out the data (extremely wasteful) but rather does so based on the query structure. The only thing that I can think of is, when you ask it to 'join' it's like making the joining activity explicit and probably helps run the code faster (by how much? You'll have to profile it and see) but in the comma separated case, it needs some time to 'figure' out the optimum strategy. I may be wrong, but I'm just making an educated guess as to how one would code it...

I have only ever once seen the two result in a different set of optimizations and if memory serves it was in ms-sql2k on a really hairy query. In that one example the old form used with *= resulted in about 4x faster performance. Nobody, including our Microsoft tech guys could ever explain why. The MS guys labeled it a mistake. I have never seen it again.

Since most RDBMS are smart enough not to do the full cartesians, the biggest reason I can think of not to use it (besides that it is depreciated) is that most of the people under 30-35 that I have worked with have never seen the old form before and get terribly lost when they encounter it.

One reason for the more terse syntax is that it's more terse, so if you're comfortable with it it's easier to read. I think of the verbose case as similar to writing out arithmetic in COBOL, e.g. MULTIPLY A BY B GIVING C.