SQL Join Table vs. SQL Entity Table And When To Switch

I have to say that I have never been too educated when it comes to Database architecture. I have picked up tips here and there and found patterns that I really like, but overall, no one has ever sat me down and taught me the tried and true methods of database development. Occassionally, this comes to light. About a year ago, I was on the phone with Frank Yang, CTO/CIO of RecycleBank, discussing changes that we needed to make to their systems database. I kept referring to one of the new tables we were designing as a "join table". Frank found this very confusing because it contained more than foreign keys. As he explained, industry standards dictate that Join tables only contains the foreign keys of the tables being joined. If you have anything more than that, you no longer have a join table, you have an entity table.

It was very hard for me to accept this fact, and we argued over this for some time before coming to a conclusion (of which I cannot remember). Over time, however, I have slowly begun to see Frank's point. Once you do have more than just foreign keys in a table, you really are modelling a totally different kind of relationship between two entities. And, that relationship, often times, should become an entity in and of itself.

Let's walk through a thought experiment so I can think out loud and try to explain myself. Let's pretend we the following two tables:

boy-------------------id - int (pkey)name - varcharage - tinyint

girl-------------------id - int (pkey)name - varcharage - tinyint

Pretty simple. Sure, this could be one table with a gender column, but for our demonstration, we need to tables.

Now that we have these two entity tables, let's join together the boys and girls. My initial reaction would be to create a join table with the keys of the two joined entities:

boy_girl_jn-------------------boy_id - intgirl_id - int

This is a "standard" join table. All it does is join two entity tables in a many to many relationship. Notice also that my naming convention dictates that all join table names consist of the name of the two entity tables in alphabetical order followed by, "_jn", which gives us "boy_girl_jn".

So far so good. But these foreign keys don't really tell us much about this relationship at all. It merely tells us that N boys are related to N girls in some way. Let's go ahead now and say that this relationship can be either romantic or platonic. We can define this using a boolean, is_romantic column:

Now our join table has one additional, tiny piece of information. No problem - this doesn't really affect our join too much. But now, let's come to accept the sad realization that relationships don't always last; friends become enemies, enemies become friends, dogs and cats living together - it's the circle of life. In a join table, if a relationship ends, we delete the join records; however, these are human relationships and we don't simply want to delete them if they end - we want to record their start and finish for historical purposes. How do we deal with that? Easy, let's add a start date and an end date:

With these two new columns, date_started and date_ended, we start to hit a problem - the existence of a "join" record no longer indicates the current relationship between the two entities involved; there might very well be a join record for a boy and a girl that have no relationship currently. While this is ok from a data standpoint - we have all the necessary fields - you can see that intent of a table has shifted dramatically. We are no longer just creating a many to many relationship between two entities, we are staring to model a third entity all together - the human relationship.

It's time that we convert out join table to an entity table. Let's take boy_girl_jn and create "relationship":

I don't know about you, maybe this is just me, but now that we have an entity table rather than a join table, my mental view of it has changed completely. Something about the fact that it is an entity table totally frees me up to think about it in a much more rich and robust way. Now, I can see other attributes that I might want to add, such as which party started and ended the relationship:

You can just go on from there, adding a description, a rating, etc.; anything else you can think of to define a human relationship.

And again, there is nothing that says we could not have technically just put all of these columns in our original boy_girl_jn table; we're not really talking about technicalities here - we're talking philosophy. As we added more than just the foreign keys to our join table, the intent and the meaning of our table took a dramatic turn. We really were starting to model another entity rather than a table relationship.

And what about recurring relationships? With a human relationship, just because it ends, it doesn't mean that it won't start again - I am sure we have all been there at some point. I was in a relationship once that had two "romantic" phases separated by a "friend" phase. This could be recorded as three separate relationships with different start and end dates. When you get something like this happening, you can really see that the concept of a "join" table has been totally eradicated.

Thanks for letting me think out loud. I hope that you got something out of this.

Not sure about that. Couldn't people potentially have multiple types of "relationship" with people? Consider the way Facebook/LinkedIn/etc work - you can be friends with someone, a coworker, gone to school together, etc.

Has there ever been an attempt to model life in a db? A real model would be infinitely big, but maybe a simplified model? Not sure what you would use it for but would be interesting to see how it worked.

So now I can have two boys or two girls in a relationship. I can also have threesomes and so on. Well obviously, *I* can't have threesomes because I'm a geek who would bother to model this in SQL. Also, a person can be in multiple relationships.

Not sure about that. Couldn't people potentially have multiple types of "relationship" with people? Consider the way Facebook/LinkedIn/etc work - you can be friends with someone, a coworker, gone to school together, etc.

I disagree with you on this (and agree with your initial train of thought). There is no fundamental difference between a relationship table and entity table, in much the same way there is no difference between a circle and an ellipse - one is simply a specific instance of the other. In the geometric example, a circle is an ellipse where the foci just happen to be collocated; in the database example, a relationship table is an entity table that just happens to have no additional columns.

Every relationship (be they one-to-one, one-to-many or many-to-many) can be garnished with additional details if one so chooses. In the case of one-to-one and one-to-many relationships, we might create a table between the existing two and add the details there. The only difference with a many-to-many is that the table already exists.

Hi Ben.. Love your passion for CF, but when designing databases all tables are entities. With relational modeling, all you have are entities with relationships depicted with cardinality. With dimensional modeling you have fact tables, dimensions and factless fact tables (just keys) or intermediate table to explain the Many to many relationship. I think the terminology really should be intermediate table when dealing with m:m relationships (foreign keys only). Thanks, Tim