If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Enjoy an ad free experience by logging in. Not a member yet? Register.

How to set up tables. Dynamic content?

My question involves a scenario, so I'll use twitter as the example. We have a table with user id, pass, etc. Now, we have a table with all tweets? Does each user have their own table with tweets or, when a page loads, does the server search through EVERY tweet from EVERY user and match user ids? I'm just confused how else you would find a specific user's tweets than to search through every tweet. Hope this makes sense, currently learning the basics but stuck on this idea.

That's what I was thinking. Now assume I have 20,000 tweets in a database. Or twitter's seemingly infinite number. Is this when php is no longer an option? I mean, must take forever to search through every tweet (server-side, not myself).

That's what I was thinking. Now assume I have 20,000 tweets in a database. Or twitter's seemingly infinite number. Is this when php is no longer an option? I mean, must take forever to search through every tweet (server-side, not myself).

MySQL keeps the indexes sorted and will search for the index to retrieve the requested records. It doesn't have to do a full table scan. And 20,000 records is not very much. I have a table with GPS coordinates for about 2,300 records and MySQL can do a full table scan while performing geometric calculations on the GPS coordinates in a few thousandths of a second. This is much more computationally demanding than a basic select query.

I used to manage a MySQL DB that *GREW* at the rate of roughly ONE GIGABYTE every 20 minutes. There were several tables in the DB, but probably 90% of each gigabyte was concentrated in one of three tables. And those three tables each had two indexes. And *AT THE SAME TIME* the DB was growing at that rate, we could make a SQL query to find 100 particular records from *EACH* of those three tables, use Java code (not JS) to create a GRAPH of the 2100 data points thus represented (7 data points per record) and present that graph to the use in CONSIDERABLY less than one second.

Oh...did I mention that the database was also trimming old data at the same time, keeping the total DB size to a user-specified size between 20GB and 50GB?

Meaning that the total number of records in those three tables ranged from 100 million to 250 million records at any given time.

It's a bit of a let down. The current DB I am working with has only two moderately large tables. One about 12 million records, one about 14.5 million.

Hmmm...Let me try a query on that one with 12 million records.

It's a record of page hits by page name for the last 3 years, roughly.

Let me try counting how many hits a given page got.

Okay, done. MySQL took 10 milliseconds to do that. About 12,000 records out of the 12 million...counted in 0.01 seconds.

In the 14.5 million record table, things are organized by zip code. Selecting a count of all records in a single zip code takes 0.02 seconds. 20 milliseconds.

Or how about this one:

Code:

SELECT COUNT(*) FROM tablename WHERE phone LIKE '360%';

A bit more complex, with the LIKE in there, right?

220 milliseconds.

And remember, MySQL is *NOT* even the most efficient database on the planet...by far! SQL Server and Oracle can run rings around it in many situations.

People constantly amaze me by not realizing just how powerful these query engines are! Hundreds of very very smart programmers have poured their life's work into making these things. THEY ARE FAST! Really.

I will say that if, for example, you wanted to search for (say) the word "people" in the TEXT of all 20,000 tweets, there is no good way to use an INDEX for that with MySQL. So, yes, MySQL would have to actually read all 20,000 records and look in the tweet text field to try to find that word. That is:

Code:

SELECT * FROM tweets WHERE tweet LIKE '%people%'

will not be particularly fast.

BUT...

But say you put an index on the WHENTWEETED field and you knew that the tweet you are looking for was tweeted in the last two days:

Now MySQL will only need to look at the full test of the tweet in records added in the last 2 days. The selection to find all records in the last 2 days will take perhaps 5 milliseconds and then the time to search the test will depend on how many records were posted in those 2 days.

In other word, take advantage of your indexes! And don't be afraid to add indexes if it looks like they might help.

Well no not 20,000, but twitter has billions, and hundreds of millions a day.

Sure. And more than likely Twitter doesn't allow you to do something like "find me all the tweets made in the last week that have the word people in them." (Though they might! I have seen "full text" search engines that could do that.)

But they *might* allow you to say "Find me all the tweets that @WhiteHouse has made in the last year that contain the work debt."

And that because limiting the search to a single user cuts the millions and billions of records to be searched down to at most a few thousand.

It's all a matter of organizing the data and providing indexes that make the searches reasonably efficient.