Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

As far as I could find out many DBMSs (e.g. mysql, postgres, mssql) use fk and pk combinations only to constrain changes to data, but they are rarely natively used to automatically select columns to join (like natural join does with names). Why is that? If you've already defined a relationship between 2 tables with a pk/fk, why can't the database figure out that if I join those tables I want to join them on the pk/fk columns?

EDIT: to clarify this a bit:

suppose I have a table1 and a table2. table1 one has a foreign key on column a, which references to the primary key on table2, the column b. Now if I join these tables, I'll have to do something like this:

SELECT * FROM table1
JOIN table2 ON table1.a = table2.b

However, I already defined using my keys that table1.a references to table2.b, so it seems to me that it shouldn't be to hard to make a DBMS system automatically use table1.a and table2.b as the join columns, such that one can simply use:

11 Answers
11

In many cases, there are more than one way to join two tables; See the other answers for lots of examples. Of course, one could say that it would be an error to use the 'automatic join' in those cases. Then only a handfull of simple cases where it can be used would be left.

However, there is a severe drawback! Queries that are correct today, might become an error tomorrow just by adding a second FK to the same table!

Let me say that again: by adding columns, queries that do not use those columns could turn from 'correct' into 'error'!

That is such a maintenance nightmare, that any sane style guide would prohibit to use this feature. Most already prohibit select * for the same reason!

All this would be acceptable, if performance would be enhanced. However, that's not the case.

Summarizing, this feature could be used in only a limited set of simple cases, does not increase performance, and most style guides would prohibit its usage anyway.

Therefor it is not supprising that most database vendors choose to spend their time on more important things.

You may want to join based on the pickup state. Maybe you want to join on the delivery state. Maybe you want to perform 2 joins for both! The sql engine has no way of knowing what you want.

You'll often cross join scalar values. Although scalars are usually the result of intermediate calculations, sometimes you'll have a special purpose table with exactly 1 record. If the engine tried to detect a foriegn key for the join.... it wouldn't make sense because cross joins never match up a column.

In some special cases you'll join on columns where neither is unique. Therefore the presence of a PK/FK on those columns is impossible.

You may think points 2 and 3 above are not relevant since your questions is about when there IS a single PK/FK relationship between tables. However the presence of single PK/FK between the tables does not mean you can't have other fields to join on in addition to the PK/FK. The sql engine would not know which fields you want to join on.

Lets say you have a table "USA_States", and 5 other tables with a FK to the states. The "five" tables also have a few foreign keys to each other. Should the sql engine automatically join the "five" tables with "USA_States"? Or should it join the "five" to each other? Both? You could set up the relationships so that the sql engine enters an infinite loop trying to join stuff together. In this situation it's impossible fore the sql engine to guess what you want.

In summary: PK/FK has nothing to do with table joins. They are separate unrelated things. It's just an accident of nature that you often join on the PK/FK columns.

Would you want the sql engine to guess if it's a full, left, right, or inner join? I don't think so. Although that would arguably be a lesser sin than guessing the columns to join on.

I consider foreign keys and normalization to be very relevant to table joins.
–
Michael DurrantMar 16 '12 at 6:07

1

Your arguments hold when the normal JOIN keyword always tries to match that (as I did wrong in my example, I'll fix that). However, many joins can be derived directly from only the joins so I don't see any reason why there can't be any explicit syntax for joining them. Many DBMSs do have a natural join, which does basically the same thing but with column names (=bad). The same thing could be done with this type of join, e.g. by specifying an AUTO JOIN operation.
–
TiddoMar 16 '12 at 7:56

3

"It's just an accident of nature that you often join on the PK/FK columns" -- I'm not convinced!
–
onedaywhenMar 16 '12 at 12:24

@Michael Durrant. Normalization? Consider a lookup table with 1 column, using a natural primary key. All the foreign keys would be for integrity checking only. No joins involved. True, we will usually join tables on PK/FK fields. But just because that particular path is most frequent does not make it the "purpose" of the PK/FK.
–
Lord TydusMar 16 '12 at 13:49

2

I would upvote if there wasn't that "PK/FK has nothing to do with table joins."
–
ypercubeMar 16 '12 at 14:11

Foreign Keys and Foreign Key constraints came later and are essentially an optimization for "transaction" style applications.

Relational databases were originally conceived as a method of applying complex queries on sets of data in a way that was mathematically provable using relational algebra. I.E. for a given set of data and a given query there is always a single correct answer.

Relational databases have come a long way since then, and there primary use as the persistence layer for transactional systems was not what CODD et. all envisaged.

However the ANSI standards body for all its conflicting goals and vendor politics has always striven to preserve "mathematically provable" properties of SQL.

If you allowed the database to infer the join properties from "hidden" foreign key data you would lose this property (consider the ambiguity if there was more than one set of foreign keys defined).

Also a programmer reading the SQL would not necessarily know what foreign keys were currently defined for the two tables, and, would need to examine the database schema to work out what the query was doing.

Thanks, this made sense to me! However, don't natural joins have the same problems? Although natural joins even have bigger problems many DBMS do support them. IMO a join based on pk/fk would be a natural join done right.
–
TiddoMar 16 '12 at 8:00

There is no difference as far as most database engines are concerned between a natural join and an explicit " JOIN ... ON ". The engine analyses the query and does the join as best it can based on the various predicates. Using an explicit join does not force the usage of a particular index or access path, its there mostly to support the "LEFT,OUTER,INNER" join syntax which needs to know the explicit join predicates to know when to insert a "missing" row.
–
James AndersonMar 16 '12 at 9:40

1

@Tiddo, no one who is competent would ever consider using a Natural join, that is why some databases don't allow them. WHy are you so hung up on avoiding typing a few characters that you want to introduce a new syntax that would probably be a bad as a natural join?
–
HLGEMMar 16 '12 at 13:36

SQL did not come first! The relational model (which included the concept of foreign keys of course) was first outlined by E.F.Codd in 1969. SEQUEL, as it was then, didn't see the light of day until around 1974. Its inventors made it clear from the outset that SEQUEL/SQL was intended to be based on the pre-existing relational model - although SQL did fall short of being a truly relational language.
–
sqlvogel2 days ago

@sqlvogel -- true! Should have phrased it "SQL was implemented first".
–
James Andersonyesterday

While you have defined a Foreign Key relationship that does not mean that is how you want to join the tables in all queries. It is the most probable method for joining the tables, but there are cases where is it not correct.

You may want to use a Cartesian product of the two tables or part thereof for some purpose.

There may be other fields on which you can join for another purpose.

If you are joining three or more tables, one of the tables may be related to two or more of the tables. In this case usually only one of the possible FK relationships may be appropriate in the query.

You may be operating on a false assumption. You say 'as far as you can find out' but don't give any empirical or evidentiary proof. If the pk or fk are the best index for a query it will be used. I don't know why you are seeing this, but my guess is poorly formed queries.

Edit now that the question has been totally rewritten : The case you are describing would only be for a very small set of queries. What if there are 12 tables Joined? What if there are no FKs.... Even if there was a default join on I would still always specify the join just for readability. (I don't want to have to look at the data and then try to figure out what is being joined on)

Some Query tools actually do an auto join for you then allow you remove or edit the join. I think MS Access's Query Builder does this.

Lastly the ANSII standard states that the join must be specified. That is reason enough to not allow it.

Sorry, perhaps I wasn't clear enough. I'm not talking about indexes, I'm talking about joins. Suppose I have table1 and table2, with a fk on table1.a which points to table2.b. If I join these tables I'll have to explicitly say that I want to join them on columns a and b (e.g. 'SELECT * FROM table1 JOIN table2 ON table1.a = table2.b'), while I already defined in my database scheme that those two are related. The question is why I can't do 'SELECT * FROM table1 JOIN table2' and let the DBMS automatically choose the join columns based on the fk/pk.
–
TiddoMar 16 '12 at 0:36

2

Especially the readability made sense to me! However, that the standard says so is not a really good argument IMO. Many standards have made wrong choices before (HTML for example).
–
TiddoMar 16 '12 at 8:03

the concept of "joinability." Relations r1 and r2 are joinable if
and only if attributes with the same name are of the same type... this
concept applies not only to join as such but to various other
operations [such as union] as well.

Standard SQL already has such a feature, known as NATURAL JOIN, and has been implemented in mySQL.

Although your suggestion is not quite as worthy, it seems a reasonable one. With SQL Server (which lacks support for NATURAL JOIN), I use SQL Prompt in Management Studio: when writing an INNER JOIN its InteliSense suggests ON clauses based on both common attribute names and foreign keys and I find it very useful. I've no great desire to see a new (standard) SQL join type for this, though.

There are many reasons way the database can’t safely do this, including the fact that add/removing Foreign Keys will change the meaning of pre-written queries including queries in the source code of the application. Most database also don’t have a good set of Foreign Keys that cover all possible joins you are likely to want do. Also for better or for worth, Foreign Keys often get removed to speed up systems and can't be used on tables that are loaded in the "wrong" order from file.

However there is no reason why a query design tool or the text editor can’t auto complete a join with the help of Foreign Keys in the same way as they give you intellisense on column name. You can the edit the query if the tool got it wrong and save a completely defined query. Such a tool could also usefully make use of the convention of naming Foreign Keys columns by the “parent” table name and columns with the same name in both the parent/child table etc.

(My wife still can’t understand the difference between Management Studio and Sql Server and talks about starting sql server when she starts up management studio!)

"There are many reasons way the database can’t safely do this" -- did you meant to say "DBMS" rather than "database"?
–
onedaywhenMar 16 '12 at 12:38

@onedaywhen, no becouse a reporing tool for example that ships as part of a DBMS could do this for "ad-hock" reports.
–
Ian RingroseMar 16 '12 at 12:47

2

I can see how such a tool could do the job. I can also see how such a tool could be part of the DBMS. What I cannot understand is how a database could do this. Remember that a database is simply a variable, albeit of a complex type.
–
onedaywhenMar 16 '12 at 12:58

If omiting the ON clause is assumed to follow the fields based on the referential integrity, how would you do a Cartesian product?

Edit: using AUTO
The advantages to this is a little less typing and you don't have to know how they are joined or remember a complicated join. If the relationhip changes, it is handled automatically, but that rarely happens except in early development.

What you have to do now is decide if all of your AUTO joins hold up during a relationahip change to match the intent of your select statement.

I updated my example, since it would of course be much better to use an explicit syntax for this.
–
TiddoMar 16 '12 at 8:05

@JeffO: the main advantage is that it expresses the intent more precisely, in a very clear declarative manner. Joins on column names tell you nothing, other than the fact that some of the contents of the columns are similar to those in another (but might not be the same type). A join on a fk ref, tells you that there is a fk ref, no column list would mean there was only 1 fk between the tables, or conversely that there is 1+ (consider a multicolumn key with more than 1 ref what happens when you mix the columns c1=fk1_c1 and c2=fk2_c2).Even with more typing on average, this would be good.
–
jmorenoMar 17 '12 at 5:11

The reason is that there is the LANGUAGE, and then there is the underlying principals. The language is sparse and lacking in many features that you would expect to see in a general purpose language. This simply happens to be a nice feature that hasn't been added to the language and probably won't be. It's not a dead language so there's some hope, but I wouldn't be optimistic.

As others have pointed out, some implementations use an extension where join (column) joins two tables based on a common column name, which is somewhat similar. But it isn't widely speead, and doesn't rely upon the fact that there is a relationship between the tables.

There's no fundamental obstacle to " inner join table on PKFK" where PKFK is a keyword meaning "the foreign key relationship defined between the two tables", there might be issues with multiple fk to the same table, but that could simply cause an error. The question is do the people designing the language consider that a) a good idea and b) better to work on than some other language change...

This presumes it is a good idea that they should already have done. It is also likely that they have already considered this and decided not to do it. Perhaps it's a very bad idea in practice: Sjoerd mentioned an example, where a query could break just from adding a new column and FK relationship. Lord Tydus also explains foreign keys having a different responsibility from dictating the ways your tables should be joined.
–
Jonathan HobbsMar 16 '12 at 5:10

1

@JonathanHobbs:I meant my answer to be a generally neutral.But abandoning neutrality.Sjoerd's logic is flawed.Changes to tables already break queries, adding a new column to a tables primary key is going to either break queries or start returning incorrect results. This would in fact insulate you from that to an extent, as long as the table relationship was maintained, column changes could be done safely.This would probably increase usage of FK relationships, as it would be useful for something other than RI.Most joins are either on the PK or include the Pk.To handle multi fk,use column name.
–
jmorenoMar 16 '12 at 5:46

1

"some implementations use an extension where join (column) joins two tables based on a common column name" -- this is known in the literature as a natural join and has been in the SQL standard since 1992, so I don't think it is correct to call this an 'extension'.
–
onedaywhenMar 16 '12 at 12:36

@onedaywhen: I was referring to the syntax "natural join table_name (columnlist)", which I don't belive is ANSI 92, as an extension which allows you to join on more than one columns provided that both tables have them.
–
jmorenoAug 11 '12 at 1:03

why can't the database figure out that if I join those tables I want to join them on the pk/fk columns?

Parts of the reason are:

1 - in theory you can join tables on arbitrary columns from the two tables. While this is not common practice it is valid. Remember that SQL is like a programming language it does not understand what information is inside the columns of course and names, to SQL, don't mean much in this regards.

2 - There are different types of joins (left, righ, inner) - Inner Joins is only 1 of them.

3 - SQL standard may be guided by the principle of being a lower-level language that allows higher level dialects to form intelligence using it. The comparison is somewhat clearer if you think of a 4th generation language vs. a 3rd generation language. In fact, one tool I have used, IEF, allowed you to write something like this:

Tiddo, I believe you are completely right, SQL on that topic is quite dumb, and I remember having thought the same thing you did about foreign keys while learning SQL some ten years ago.

Ok, given that, I eventually had to pass that exam; and to pass it, I had to let go. SQL is more of a trainwreck than anyone is likely to admit, its standardization path is a complete disaster, and some implementations are menacingly turing complete. Still it's quite handy, in general. (I'm not a K/V luddite)

Foreign keys, then... not that handy at all. They are an important concept in the relational model, ok, but the SQL feature with the same name doesn't compare well.

Tell you straight: do not use that SQL feature called Foreign Key at all, until you hit some big system with performance problems. Explicitly telling the engine which field is a foreign key and which is not is only used for indexing, and it's invisible to the db user.

Is it misleading?
Yes.

Are they going to make it more powerful now, after 30 years of people being mislead?
Not a chance.

And why the heck all of this happened in the first place?
Well, the feature we call foreign keys was added later on, to SQL; SQL is a standard that evolved over time, bottom-up. Vendors implemented ludicrous features, while standard bodies facepalmed.

Foreign keys as said, where only meant for indexing and there was no JOIN construct available. (joins where done with SELECT queries, JOIN queries are quite recent and only meant to alias SELECT functionality) They probably though that calling that indexing flag FOREIGN KEY, was a clever naming-hack over relational db theory concepts.

With regard to foreign keys, I take it you've only ever touched the MyISAM engine on MySQL? 'Cause even disregarding that little rant, every single thing in this answer is wrong.
–
IzkataMar 16 '12 at 3:13

4

"JOIN queries are quite recent" you call 20 years recent in programming terms? I am so laughing at your utter ignorance and lack of understanding of databases.
–
HLGEMMar 16 '12 at 18:13