I need some advice from someone that's experienced in larger mysql project. I'm at the point where it seems logical to use multiple databases instead of one database with many tables. But I'm not sure how i should proceed. (the project today is national, but has potential to go international with even more generated traffic)

To give you some background info;
History: I've created a system to administrate pool-play for a certain sport. In the database there is info about all official clubs for the sport and all registered players. Not all players and not all clubs are registered to play in pools - but in other forms of the sport.

Future: I'm about to design a club/player administration, and a system to administrate cup-play.

So to me this looks like three dbs; a club/player db, a pool-play db and a cup-play db. Both the pool- and cup-db will use information from the club/player db.

QUESTION 1: How do I comunicate between these db's? I can't do cross-joins over different db's right?

QUESTION 2: My thought so far is to set up "read-only" tables for the pool- and cup-dbs with club and player info. As soon as some information changes in the club/player db the "read-only" tables in the other dbs will be updated. Is this the right way to go?

QUESTION 3: If the answer to Q2 is yes; Is it possible to do this with triggers in mysql or do I have to do it with php (which is my weapon of choice...)

Thanks... :)

guelphdad

02-07-2007, 07:38 PM

you would use a single database. All your data is related so it belongs in a single database with multiple tables.

If you have bus schedules and hamburgers sold then those are two separate items that would deserve their own database, unless of course both were out of the same bus terminal right?

Fumigator

02-07-2007, 07:44 PM

Without seeing your entire design, my answers may be off the mark. But with the info you've given, here are my suggestions.

You have no compelling reason to use multiple databases. In fact, you have very compelling reasons to only use one database-- the tables are all related!

The scenarios where I use different databases are when I want seperate environments. For example, I want a "test" environment, and a "production" environment. I have two different databases to accomplish this. Or, when I have 10 different clients all using a system, but I want each client's data to be isolated from other client's data. Again, different databases.

Have I convinced you yet? :)

As for question #2... a properly designed relational database will not have duplicate data stored in different areas. That sounds a little bit like what you are looking at doing. A general rule is if by updating something in one table always leads to having to update something in some other table, your design needs fixin'.

Again, not knowing the details of your design: For example, you would only store the club name in one table, the club table. You would also have a club ID that doesn't have any special meaning other than it is unique to any other rows in that table. Your pool-play table would then store the club ID, not the club name.

On question 3, triggers can be used to automatically fire off a query when a certain condition is achieved, but they should be used wisely and sparingly (in my opinion).

bitbob

02-08-2007, 08:12 AM

you would use a single database. All your data is related so it belongs in a single database with multiple tables.

You have no compelling reason to use multiple databases. In fact, you have very compelling reasons to only use one database-- the tables are all related!

My first reasons for wanting to use multiple dbs is that the club/player administration, the pool-play and the cup-play belongs to different websites (or different parts of the same website (domain)).
Second; the pool-play db as I use now has 31 tables. Add the club/player and cup-play tables and we´re up to > 100 tables. It's hard to keep track of the tables as it is now. I'm fairly good at db-design (when it comes to one db at least) so it's not a design problem - I really nees all tables.
Third; web-site traffic... I havn't yet mentioned player ranking. This is something that will be done with my system in the future, but right now is handled by an older system. When the ranking is released (happens 3 or 4 times a year) it first takes almost a day to generate the ranking then we're usually get between 20000 and 50000 people trying to access the site at the same time. The system, as it exists today, gets overloaded... :(
I don't want to run in to this problem with the new system. I want the users to be able to use the other parts of the system without lagging.
If the system gets internatinal then we will get a huge amount of more traffic too...

I don't feel comfortable with just using one db. I hope you understand my concern. If you still think I should use one db - please convince me that it is future proof... How much traffic can the db handle?

thanks for your replys so far... :)

neomaximus2k

02-08-2007, 01:59 PM

If you used three databases they would still be on the same server thus not releiving the problem of the bandwidth. It is faster to switch within the one database than it is to communicate to 3 different ones.

This sounds to me like you are trying to create an online fantasy football game which I have developed in the past and trust me you will only need one database it will handle the load its all down to the spec of the server really, if it is a shared host then that brings in a hole new problem for optimising.

Just make sure you optimise the database, setup index's correctly and make sure the queries are as optimised as possible and you will have something that will last for years!

bitbob

02-08-2007, 02:33 PM

ok guys. i think you've convinced me about using only on db, that it will be better from a technical pov. Now, my only problem is that it's difficult to keep track of all tables... It would be easier if you could group tables or something. Lika a tree->folder->file view... Are there any software that lets you do this? Not in mysql that is, but in the gui...

thanks alot all of you :)

guelphdad

02-08-2007, 03:02 PM

If you are using 30 and perhaps up to 100 tables then I'd say you are doing something wrong in your table design/database normalization. There is no reason to need that many tables.

10000000 different leagues? All in a couple of tables, denoted by an id for which league they are in.

bitbob

02-08-2007, 03:21 PM

If you are using 30 and perhaps up to 100 tables then I'd say you are doing something wrong in your table design/database normalization.

Just to clarify; it's a pretty large system for tennis... We have clubs, players, temporaryPlayers, competition, leagues, divisions, cups (in this case a special form for teams and pool-play), teams, TeamMatches (1 TeamMatch <=> * matches), match (1 match <=> *sets), sets, gameResults, contacts (users of the system), users (administrators), district, countries, gamePlans (templates for which teams should meet which and in which order), playDates... On top of this we have relations between all *<=>* tables + some extra tables (f.a.q for the website, enviroment-variables, session-control and settings tables...)

To see the system live as described above check out URL removed by request (it in swedish only) and you can only see the public view of the site, not the users view where they can report results and sign up for competitions etc, and not the administrative site where competitions etc. is created...

if you can do this with less tables please tell me how... :confused:

Fumigator

02-08-2007, 07:12 PM

I am very sure you could do it with few than 30 tables. If you want to share the design you're using, perhaps we could offer suggestions.

guelphdad

02-08-2007, 08:40 PM

most likely 30 tables and if you are not going to shrink the number of tables you currently have, there would be no need to increase the number of table simply because you are expanding the number of players or matches, those should all be covered in your current tables. So if you have 1000 members and increase to 100000 there is nothing you would do with your existing tables except add new rows, same with your matches.

If you need to add new tables then there is something fundamentally wrong with your current set-up that you should improve upon now so you don't have problems later.

If you care to show your table layouts then as fumigator said, we can take a look and offer suggestions to improve what you have.

neomaximus2k

02-08-2007, 10:22 PM

Just to clarify; it's a pretty large system for tennis... We have clubs, players, temporaryPlayers, competition, leagues, divisions, cups (in this case a special form for teams and pool-play), teams, TeamMatches (1 TeamMatch <=> * matches), match (1 match <=> *sets), sets, gameResults, contacts (users of the system), users (administrators), district, countries, gamePlans (templates for which teams should meet which and in which order), playDates... On top of this we have relations between all *<=>* tables + some extra tables (f.a.q for the website, enviroment-variables, session-control and settings tables...)

To see the system live as described above check out URL removed by request (it in swedish only) and you can only see the public view of the site, not the users view where they can report results and sign up for competitions etc, and not the administrative site where competitions etc. is created...

if you can do this with less tables please tell me how... :confused:

Well if the database is designed correctly then it shouldn't be a problem, contact me on the requirements of the site and i'll sit down and come up with a database design for you.

bitbob

02-09-2007, 09:14 AM

Image removed by request (btw, could you suggest some simple freeware for ER-diagrams?) I've left out a couple of tables, though...

As I see it I probably could merge 'Set' and 'SetSetup'. The only differense is that SetSetup allways exists for all sets and 'Set' only gets an entry when a set is played (since a match could be played best of 3 or 5 sets or something 'Set' won't contain information about all sets).

I can't reuse the current tables (well, some I can reuse but not all) for the next step of the application - cup play. The cup play is different in so many ways from the pool-play... I can't go inte details, but reusing tables would force me to have primary keys that consisted of either these fields or these fields. Or have primary keys that was unique over more than one table... And that's just one aspect of it. Personaly I think that the db should make sense not only for the computer, but for the administrator as well, beeing pretty much self-explainatory just by looking at the tables...

I have to put in some hours to rename some tables and change the application accordingly to make it work when I expand it, but you convinced me that i will be worth it in the long run. The problem is that this is an evergrowing project. I'll never know what I next should develop... :o

neomaximus2k

02-09-2007, 11:23 AM

ok I can see the nice notebook sketch :) i'll see what I can come up with for you, it might not suit you 100% but with it you will see how you can possibly optimise the database.

On the match table you would create an index on matchtype to enable quicker retrieval of data

Divisions, League, Competition
Unsure what these are being used for, I am guessing to display current positions, if this is the case then you could combine them into one table but there isn't much point.

GamePlan, GamePlanSetup
Unsure on what these are either, but to me they are one and the same need more info.

Teams
Used to house all the teams available
id primarykey int autoincrement
teamname varchar(50)
players varchar(10) (this houses the ID numbers from the player table seperated by commas)

Clubs
same as the teams table, infact if they are the same then you could add a field to the teams table to indicate its a club and have one table for them both

District
Guessing this has the district name and would then hold all the ID's of the clubs in that district
id primarykey int autoincrement
disctrictname varchar(50)
clublist longtext (ID's of the clubs)

Contact
Guessing this is the contact deatils for each team, this can be put into the team table

Results
Again this could be stored in the matches table no need for seperate table

Player
Guessing this holds info on the player so don't know what you would need here

Hopefully the imag bellow will help you to understand what I mean if you get stuck let me know

guelphdad

02-09-2007, 03:03 PM

Don't use lists of items in a single column, it violates first normal form of database normalization. When you have a case where you need a list of players with a club, you should have a new row for each player.

bitbob

02-09-2007, 03:12 PM

I really do appreciate the time you've spent on this, but I can't see a way to implement the db as you've suggested. There's just too many parameters to take into account. Parameters I havn't given you...

As it is now the db is in BCNF and grouping tables together would not only violate this but would also make me as an administrator confused when looking at the db structure.

Let's just leave the db design since I'm more and more convinced that no further optimization / normalisation is possible.

I was just curious on how some of you could say that "if you have 30 tables then there is something wrong" without knowing how complex the structure is...

One example of complexity from the project specifications; a match can consist of 1..n sets. Each set have different rules asigned to them (e.g. a set can start at 2-2 instead of 0-0, a set can have no rules, a set can/can't end in a tie, a set must be won with 2 games or end in a regular tie break "7-6(7-4)", a set is a super tie break - which should be reported as "1-0(10-2)" or something etc etc.) and each set has two sets of result values that's assigned to it when a result is reported - one set for the computer to use and one set to be displayed (since these often differs).
It's perfectly ok for a user to just report one set and leave the rest (live reporting during a match) - but this should not affect the scoreboard for the division. The scoreboard should only be updated when all results for all sets in all matches in a TeamMatch is reported. Further more for the scoreboard to be updated; all players for all matches must be corretly reported (saved in the Match table) and all info about the teammatch (place and time, number of lanes, ball types, court type etc) must be correct (stored in the TeamMatch). When this is done it triggers the system to compute the final result and store it in the Results table that is the basis for the scoreboard.

neomaximus2k

02-09-2007, 03:13 PM

Don't use lists of items in a single column, it violates first normal form of database normalization. When you have a case where you need a list of players with a club, you should have a new row for each player.

That is true I always forget about normalising database's :S
Any chance I can use your basic join article on my site with a link through to yours?

neomaximus2k

02-09-2007, 03:55 PM

I really do appreciate the time you've spent on this, but I can't see a way to implement the db as you've suggested. There's just too many parameters to take into account. Parameters I havn't given you...

As it is now the db is in BCNF and grouping tables together would not only violate this but would also make me as an administrator confused when looking at the db structure.

Let's just leave the db design since I'm more and more convinced that no further optimization / normalisation is possible.

I was just curious on how some of you could say that "if you have 30 tables then there is something wrong" without knowing how complex the structure is...

One example of complexity from the project specifications; a match can consist of 1..n sets. Each set have different rules asigned to them (e.g. a set can start at 2-2 instead of 0-0, a set can have no rules, a set can/can't end in a tie, a set must be won with 2 games or end in a regular tie break "7-6(7-4)", a set is a super tie break - which should be reported as "1-0(10-2)" or something etc etc.) and each set has two sets of result values that's assigned to it when a result is reported - one set for the computer to use and one set to be displayed (since these often differs).
It's perfectly ok for a user to just report one set and leave the rest (live reporting during a match) - but this should not affect the scoreboard for the division. The scoreboard should only be updated when all results for all sets in all matches in a TeamMatch is reported. Further more for the scoreboard to be updated; all players for all matches must be corretly reported (saved in the Match table) and all info about the teammatch (place and time, number of lanes, ball types, court type etc) must be correct (stored in the TeamMatch). When this is done it triggers the system to compute the final result and store it in the Results table that is the basis for the scoreboard.

LoL well the reason we all said if you had 30 tables blah blah blah was based on the information you gave us.

There are instances where database's are big ones but there are always ways to optimise the database and achieve what you need to.

guelphdad

02-09-2007, 06:03 PM

You also said:

the pool-play db as I use now has 31 tables. Add the club/player and cup-play tables and we´re up to > 100 tables.

right there it says that there is something wrong. you shouldn't have to add another 69+ tables to add more information that should be in a very few tables.

bitbob

02-12-2007, 08:09 AM

guelphdad: Yes and no... I probably exaggregated a bit... :rolleyes: I won't have to add 69+ tables. The club/player administration won't add anything, since I only have to do some redisgn of the tables that allready exists.

The cup-play mode on the other hand will add some tables, since it's not compatible with what allready exists. But not 69 tables... Probably somewhere between 10 or 20 tables...

But as I said before; it's an ever growing project. More and more will be added to the db i due time...

A final question; From the beginning of the thread I've had the feeling that you think that it's a good thing to merge tables if possible. "Add an enum-field to distiguish between the different types of data..." (or something like that)... My personal thoughts on this matter is that if I have two different classes in php then I should have two different db-tables. Or, if the two types of data never will co-exist - they belong to different domains, then I shouldn't bunch them together nomatter how alike they are... It must be faster (and more logical) to do a "SELECt * FROM table" than a "SELECT * FROM table WHERE enumeratedDataType = 'cupCompetition'", right?

And even if sometimes it might be faster or better optimized to do it a certain way, it's not allways the best. Look at oop. It brings a lot of overhead to a program and is slower than doing a structured non-oop program - but it's much, much easier for the programmer and will be easier to maintain and extend...

guelphdad

02-12-2007, 01:52 PM

It isn't as simple as a SELECT * or a SELECT * WHERE, it depends on what larger queries you are doing and when you have to get info from multiple tables.

If for instance you have 100 different tennis clubs and within those clubs 100 different players each. And yes I'm simplifying.

There are folks who think that they should create 100 different tables merely because there are 100 clubs and those clubs are not owned by the same company.

There should be three tables

1 table to hold information about the clubs
1 table to hold personal information about the players (name, ages etc.)

and a third table managing the relationship between clubs and players

playerid, clubid

and for every club a player belongs to you should add a new row. So if Arthur Ashe belongs to three clubs, he has three rows in that table.

If you now go ahead and add another 10 clubs, you don't add another table as all that data already belongs in the existing tables.

Now to complicate things a bit more (and I don't watch a lot of tennis so you would go into more specifics in your scoring), when you add info about matches you shouldn't have to add significantly more tables, only a few tables with more rows.

An event table with a date column, two columns indicating the combatants, a venue column. within that table you might use an auto increment column as a primary key.

you then have another table for scoring with the primary key from the event table, and necessary scoring columns. I don't know exactly what those columns would look like because as you said there are different ways of scoring, but they should still be minimal.

There really shouldn't be a need to add beyond these few tables. Yes you may want a countries table to normalize that type of data for the clubs and players and maybe a few more.

Until we actually see the data you are working with though it isn't straight forward to know what you are doing. But less tables with more rows is the way to go.