Some of this is old stuff, but more people are now converting to InnoDB plugin, so as to enjoy table compression, performance boosts. Same holds for people converting to Percona's XtraDB. InnoDB plugin requires innodb_file_per_table. No more shared tablespace file.

So your ibdata1 file is some 150GB, and it won't reduce. Really, it won't reduce. You set innodb_file_per_table=1, do ALTER TABLE t ENGINE=InnoDB (optionally ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8), and you get all your tables in file-per-table .ibd files.

But the original ibdata1 file is still there. It has to be there, don't delete it! It contains more than your old data.

InnoDB tablespace files never reduce in size, it's an old-time annoyance. The only way to go round it, if you need the space, is to completely drop them and start afresh. That's one of the things so nice about file-per-table: an ALTER TABLE actually creates a new tablespace file and drops the original one.

The procedure

The procedure is somewhat painful:

Dump everything logically (either use mysqldump, mk-parallel-dump, or do it your own way)

Erase your data (literally, delete everything under your datadir)

Generate a new empty database

Load your dumped data.

Using replication

Replication makes this less painful. Set up a slave, have it follow up on the master.

Stop your slave.

Make sure to backup the replication position (e.g. write SHOW SLAVE STATUS on a safe location, or copy master.info file).

Work out the dump-erase-generate-load steps on the slave.

Reattach the slave to the master using saved data.

For this to succeed you must keep enough binary logs on the master for the entire dump-load period, which could be lengthy.

Upgrading to barracuda

If you wish to upgrade your InnoDB tables to Barracuda format, my advice is this:

Follow the steps above to generate a file-per-table working slave

Stop the slave

Configure skip_slave_start

Restart MySQL

One by one do the ALTER TABLE into Barracuda format (ROW_FORMAT=COMPACT or ROW_FORMAT=COMPRESSED)

Note that if you're about to do table compression, the ALTER statements become considerably slower the better the compression is.

If your dataset is very large, and you can't keep so many binary logs, you may wish to break step 5 above into:

ALTER a large table

Restart MySQL

Start slave, wait for it to catch up

Restart MySQL again

and do the same for all large tables.

Why all these restarts?

I've been upgrading to Barracuda for a long time now. I have clearly noticed that ALTER into a COMPRESSED format works considerably slower after the slave has done some "real work". This in particular relates to the last "renaming table" stage. There was a bug with earlier InnoDB plugin versions which made this stage hang. It was solved. But it still takes some time for this last, weird stage, where the new replacement table is complete, and it's actually been renamed in place of the old table, and the old table renamed into something like "#sql-12345.ibd", and all that needs to be done is have it dropped, and... Well, it takes time.

My observation is it works faster on a freshly started server. Which is why I take the bother to restart MySQL before each large table conversion.

@Dmitri,
yes, that's what I meant with regard to replication. It is a very common practice.

Do note that, strictly speaking, it isn't a "zero downtime". There is some (small) period of time where you stop writing to the master, verify that slave has caught up, and move everything to the slave.

It actually can take a few seconds; with MMM I do that a lot, and I usually evidence a roughly 10 seconds of complete lockdown.

I'm also assuming you won't be dumping just selected tables. It can be done, but complicates stuff. Sure, you can hack your way around, keep all MyISAM files, delete only InnoDB files & logs. What other various logfiles should remain in the datadir? They all become meaningless.

With regard to "Just deleting all files in the datadir is not always sufficient as the iblogfiles are somtimes moved to a different filesystem": I recommend that you verify that all of:
- ibdata1, ..
- ib_logfile[012...]
- *.ibd files
are on the same filesystem. Again, unless you actually know what you're doing and have such I/O bottlenecks that you have to move InnoDB files around.
With InnoDB files spread over multiple filesystems, you lose the ability to take FS snapshots.

Shlomi

Daniël van Eeden

@Shlomi

Even InnoDB-only setups do have MyISAM tables for the internal mysql.* tables.

Just doing a dump/restore for InnoDB tables could be nice if there are a few big MyISAM tables.

One of the files you might want to safe is the master.info file if it's a slave. And it might be nice to safe errorlogs and slow query logs to compare with the situation after the restore.

With NetApp you could use one volume with multiple Qtrees to create atomic snapshots and still use multiple mountpoints.

@Daniël,
I'm wondering about your NetApp setup. Do you have your different filesystems mapped to different disks on your NetApp?

Gabi Davis

Hi shlomi,
Good post.
However, when handling large and very busy databases the restore from dump is not an option. The time to dump and load as well as the storage required for keeping binary log files till this entire operation is over is a kill over.
Are there no other ways to do this? we're carrying on our back a file of 350G and the only option we currently have is to keep carrying it even though we plan to have no tables left in it in the near future.
Also, are there no plans in the mysql world to finally allow altering tables without a rewrite of the full table? that's a huge downside of mysql as I see it.

When working with a slave, you can get the benefit of using mk-parallel-dump + restore. I've had these boost up export/import time by magnitude of 6x.

Assuming your dataset size is indeed 350GB, I would guess this should take a day or two at the most (pure guess, based on experience with commodity hardware?). Why don't you try offline and see how much time it takes for export/import, without actually taking down a slave? I mean, export from a (stopped) slave, then import on a local office machine; measure the time for export + import. Then you can see if you have enough room to store the binary logs.

Since I'm assuming your dataset always grows, I should add: the sooner the better.

With regard to altering without rewrite of full table: InnoDB plugin's barracuda format allows for fast index creation.
Or you could use Facebook's OSC, or openark kit's oak-online-alter-table

Gabi Davis

Hi shlomi,
Thanks for the timely reply 🙂
Our entire dataset is almost double that, the innodb file is 350G on its own... and although two days of traffic is huge there are definitely ways to overcome the disk issue but in general, it's sad to realize that this is the only way to go. We were hoping to digg deeper and find something else we could do that wouldn't require a dump and reload of the entire data.
If we have to go this way we'll definitely take the opportunity to make some structural changes we had been planning for long and did not run due to the size of the tables and the nature of 'alter table', and probably also upgrade to 5.5.
As for the fast index creation - that is definitely great, but there's also the problem of not being able to add/drop columns (and I have to read more about the plugin to see if there's anything about that).
Thanks again,
Gabi