The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Modeling and Designing a database

I'm trying to learn more about modeling and designing databases and I figured the best way would be to actually write some local apps for academic purposes only. The first app I'm going to write is going to be very simple, it's just going to store specific information for each specific user. The data I'm going to be storing is:

firstname
lastname
state - for a specific search, like find all users in Californiaemail - for loginpass - for login
-------------link1
link2
........ - the main "data" for each user, 10 hyperlinks to wherever.link9
link10
-------------

My first question is, for something this simple, would having just one table be OK? Assume there could be lots of users and the DB might change down the road. Maybe the fictional boss wants users to have up to 25 links so an additional 15 links will need to be added to the table. Also, users might not actually provide 25 links (that's just their limit) so there could be a lot of null values being stored.

I've also broken this down into 3 possible tables (with this being the primary key) so this is my attempt at listing an entity relationship diagram here:

Users

user_id

firstname

lastname

email

pass

Links

user_id

link1

link2

...

link9

link10

State

user_id

state

The cardinality I've come up with, which might be a bit hard to decipher is:

State 1-----(contains)-----M Users 1-----(has)-----M Links

1 State can have many Users, many Users can live in 1 State (Users -> State is M to 1 relationship).
1 User has many Links, many Links can belong to 1 user (Users -> Links is 1 to M relationship).

Does this look OK? Is this the better way to do this, even though it adds more complexity than the first example of just one table?

You need two tables, one for the user (and state is an attribute of the user, so it does NOT belong in a separate table but in the user table - even your choice of primary key for your state tables shows that) and one for the links - one user has many links. You don't need a separate autoincrementing id as suggested above, you just make the userID and link the joint primary key. ie links {userId, link} so just two columns. This now allows you to add as many links for each user as you wish, and also makes searching for people who have some links in common, as you now only have to search the two one column in the links table. Using multiple columns as in your first "design" would require you to query every single column that contained a link and compare it to every other column, for every other person - which would be totally the wrong way to solve such a problem. Normalising the data enables you to do many queries more easily, including queries you didn't think of. Imagine having to alter the table design every time you realise you need a few more link options for the user. The two table approach enables you to store one or one hundred or one hundred thousand links for one person.

It would be a good idea to start reading up on normalisation if you wish to make better databases.

PS If you decided to store more info about each state - capital, population, area, say, THEN you'd need a separate state table AND a user-state table as well

state---<user-state>---user
with user-state just containing the primary keys for user and state, as a joint primary key for that table.

State---User is a many to many, resolved as two one to many relationships.

You need two tables, one for the user (and state is an attribute of the user, so it does NOT belong in a separate table but in the user table - even your choice of primary key for your state tables shows that) and one for the links - one user has many links. You don't need a separate autoincrementing id as suggested above, you just make the userID and link the joint primary key. ie links {userId, link} so just two columns. This now allows you to add as many links for each user as you wish, and also makes searching for people who have some links in common, as you now only have to search the two one column in the links table. Using multiple columns as in your first "design" would require you to query every single column that contained a link and compare it to every other column, for every other person - which would be totally the wrong way to solve such a problem. Normalising the data enables you to do many queries more easily, including queries you didn't think of. Imagine having to alter the table design every time you realise you need a few more link options for the user. The two table approach enables you to store one or one hundred or one hundred thousand links for one person.

It would be a good idea to start reading up on normalisation if you wish to make better databases.

So my schema would look like this?

Users

user_id

firstname

lastname

state

email

pass

Links

user_id

link

Where a select on the Links table (with 3 users, each with 3 links each) would look something like:

Yeah, though what I'm doing isn't really realistic to begin with, it's more academic than anything else. URLs might not have been the best "data" choice, it's just what I thought of....I was just going to dump in root-level domains when I populate, http://www.yahoo.com, http://www.google.com, etc.

where user_id is the id from the user table i.e. a foreign key. You would NOT auto-increment it, as it would then have no relationship to the user_id in the user table. You must set up a joint primary key using the two fields in the table, as shown above.
ie the schema would be

You would NOT auto-increment it, as it would then have no relationship to the user_id in the user table.

Ah yes, of course, that makes perfect sense. The old copy/paste didn't quite work out for me. It's important that all the other attributes are the same, though, yes? A foreign key's attributes should be the same as it's corresponding primary_key in another table?