"defragmentation" or splitting tables?

08-31-2008, 12:35 PM

Hi,

I have a few big tables but only a small portion is normally used for selects. However, it takes a long time to make the initial selects until the data is cached into memory (query cache or some other memory cache).

I noticed that MySQL creates 1 file per table, I feel this is quite bad for performance?

Is there any way to either move the most used rows into the start of the files or split the files so that the most used rows are in the same file?

I wouldn't want to split the tables itself as I feel that's a bad approach, the RDBMS should be able to handle that transparently?

I think my indexes are ok etc but it's still very slow (initial queries).

You can try looking into the MERGE storage engine if you're using MyISAM, or if you're using MySQL 5.1, you can try partitioning the table. Neither of these approaches are perfectly transparent, as there will be some work on your part to either split up the table or define the partitions. Each approach also comes with its limitations, so you may have to change how the table is used.

If your use case is something simple, like you usually query only rows containing recent data, a good date based index will probably work better than breaking up the table.

Comment

You can try looking into the MERGE storage engine if you're using MyISAM, or if you're using MySQL 5.1, you can try partitioning the table. Neither of these approaches are perfectly transparent, as there will be some work on your part to either split up the table or define the partitions. Each approach also comes with its limitations, so you may have to change how the table is used.

That's some great advice there, I will see about MERGE and 5.1. Maybe it's worth going with 5.1 straight away.

(btw, there is no rule how the data accessed is ordered, it's pretty random throughout the files, and I use MyISAM)

Thanks!

Comment

As you look into partitioning, you'll notice you have to have some function which determines which rows belong in which partition of your table. If you have truly random data, there won't be a way to ensure that "newer" data resides in a single partition. If you will always end up accessing all your partitions to retrieve the data you need, you're probably better off just having one big table.

Comment

I was hoping I could determine myself how to partition - I will have to see if it will be enough.

I have a few tables (3-4) that have a few million rows each and I do some nasty joins with them (indexed of course) )

The queries don't scan that many rows per query, max 500, but it seems that the size of the files is just a bit too much to handle, i can hear the hard disk operating the whole time (a few minutes). One such query may take a few seconds (measured by slow query logging).