Bug Description

I am consistently getting the following error when trying to backup a partitioned InnoDB table. InnoDB is configured with file_per_table. The particular partition on which the read fails varies. There does not seem to be file corruption since I can stop/restart the database without problem and InnoDB itself reports no problems with the table. It is a large table with ~500 partitions.

The reason is that the InnoDB file I/O subsystem may reuse file descriptors by closing the old ones when the number of open files hits innodb_open_files. Which works for InnoDB, because if InnoDB needs to access a table which has been closed, it would just reopen it.

However, that doesn't work for XtraBackup, since it only keeps a file descriptor when copying a file. So when the --parallel option is used, there's a chance that another thread wants to open a file and hits innodb_open_files. So fil_try_to_close_file_in_LRU() may close a file descriptor which is currently being in use by another thread and then this descriptor is shortly reused when opening another file. Which would result in obscure failures like this.

Another important part to this problem is the fact the XtraBackup leaks file descriptors. Which is bug #713267. But even after that bug is fixed, there will still be a possibility to hit this bug, but setting a very low value of innodb_open_files for XtraBackup, and then using a very high --parallel value. So what needs to be done to fix this in addition to fixing bug #713267, is to fail when XtraBackup hits the innodb_open_files limit, rather than follow the default InnoDB behavior and close some random files.

On Thu, 26 Apr 2012 15:31:13 -0000, Vadim Tkachenko wrote:
> Should we just mention in documentation that if you run with --parallel
> you need also to specify big values (how big?) for innodb_open_files?
> Will be that enough workaround?
>

I think once bug #713267 is fixed, we should make XtraBackup
automatically set innodb_open_files to be able run with the specified
--parallel value. Which will be a fix for this bug.

As 1.6 is the old stable release, I don't think we need to fix there unless somebody explicitly needs it. If you do need a fix for this bug, contact us (Percona) and we can sort something out (or post here).