Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I am at 2,000,000 rows and am really starting to notice the slowdown now. I always heard that MySQL wasn't very scalable

I expect this database to get larger and I have a lot more to insert

would using an Amazon micro instance have any benefit here?

I don't foresee my upload speed being a bottleneck, so my logic is that their distributed processing will make the database quicker.

This turned out to be more of a CPU utilization issue but would amazon's solution be better, perhaps more optimized for its more available processors? With this logic a more expensive instance would have better performance

but with so many variables involved here, does anyone have experience with this to say?

A single Amazon node is basically a virtual machine - it's not distributed. Have you done any performance analysis to track down what the problems are on your local database? I'd do that first.
–
tblPhilAug 2 '12 at 20:57

2 Answers
2

You should fully tune the MySQL Environment, particularly your InnoDB settings. (See the bottom of my Answer for tuning tips). This would be much better than fighting Amazon for elbow room in RAM/Disk. Why did I say fight?

If you just spun up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects

The larger the model, the bigger the InnoDB Buffer Pool. Oh yeah, the bigger the budget needs to be.

What most people are unaware of is that the InnoDB Transaction Log Files are the same size for all models : 128M. Given default settings (innodb_log_files_in_group = 2), that's 256M of Transaction Log Space. If you wish to set some other InnoDB option, do this:

Create all secondary indexes prior to loading. This is counterintuitive for those familiar with other databases. Adding or modifying a secondary index causes MySQL to create a new table with the index changes, copy the data from the existing table to the new table, and drop the original table.

Load data in PK order. This is particularly helpful for InnoDB tables where load times can be reduced by 75-80% and datafile size cut in half.

Disable foreign key constraints foreign_key_checks=0 For flat files loaded with LOAD DATA INFILE, this is required in many cases. For any load, disabling FK checks will provide significant performance gains. Just be sure to enable the constraints and verify the data after the load.

Load in parallel unless already near a resource limit. Use partitioned tables when appropriate.

Use multi-value inserts when loading with SQL to minimize statement execution overhead. When using mysqldump, this is done automatically.

Reduce InnoDB log IO innodb_flush_log_at_trx_commit=0

I hope you are not using LOAD DATA INFILE for importing. Why ? Because, only MyISAM tables benefit from it in conjunction with the bulk insert buffer (which by default is 8M). InnoDB does not benefit from it.

If you are loading InnoDB tables via mysqldump or your own SQL, all well and good. You may need to expand the Transaction Logs as I explained if your are imposing large transactions in bulk.

As a sidenote, please do not perform any large transactions if AWS is doing a snapshot. You should make sure snapshots are properly scheduled and then commit not to do any large transactions within the snapshot window. You could disable Snapshots for the duration of your bulk transaction, but the AWS RDS Documentation has a big caveat on that:

Warning: DO NOT DISABLE AUTOMATED BACKUPS IF YOU NEED TO RETAIN THE
ABILITY TO PERFORM POINT-IN-TIME RECOVERY. Disabling automated backups
erases all existing backups, so point-in-time recovery will not be
possible after automated backups have been disabled. Disabling
automated backups is a performance optimization and is not required
for data loads. Note that DB Snapshots are not affected by disabling
automated backups. All existing DB Snapshots are still available for
restore.

Disabling automated backups will reduce load time by about 25% and
reduce the amount of storage space required during the load. If you
will be loading data into a new DB Instance that contains no data,
disabling backups is an easy way to speed up the load and avoid using
the additional storage needed for backups. However, if you will be
loading into a DB Instance that already contains data; you must weigh
the benefits of disabling backups against the impact of losing the
ability to perform point-in-time-recovery.

DB Instances have automated backups enabled by default (with a one day
retention period). In order to disable automated backups, you must set
the backup retention period to zero. After the load, you can re-enable
backups by setting the backup retention period to a non-zero value. In
order to enable or disable backups, Amazon RDS must shut the DB
Instance down and restart it in order to turn MySQL logging on or off.

Use the rds-modify-db-instance command to set the backup retention to
zero and apply the change immediately. Setting the retention period to
zero requires a DB Instance restart, so wait until the restart has
completed before proceeding.

rds-modify-db-instance AcmeRDS --apply-immediately
--backup-retention-period=0 You can check the status of your DB Instance with the rds-describe-db-instances command. The example
displays the status of the AcmeRDS database instance and includes the
--headers option to show column headings.

Like @Phil mentioned, Amazon RDS is not distributed, so it’s pretty much the same as your local system, except that you can upgrade the machine on demand. Still, if you need to use more CPU power than is available in the largest RDS instance, you’ll be stuck. If you want to have unlimited scalability in throughput/CPU utilization, you should consider MySQL services like Xeround, which I think offers automatic scaling, or RackSpace’s MySQL service - these are distributed solutions, unlike RDS. :)