New DatabaseOperations commands backup/restore

Description

Ticket #173 is dealing with failed migrations issues, how to detect etc. This ticket proposes one of the solutions how to solve "unrevertable" failed migrations with the help of system (and improve the system).

First I would list some general rules I found useful when doing migrations:

migration development - develop and test on small data, do full test on test before deployment to production

migrating production - DO BACKUP, overview what is to be done, avoid live migrations, migrations/end user change conflict, control the process, reserve enough time for RESTORING in case of failure, test if migration is done correctly

Relating to this I think that south should have at least two new commands: BACKUP/RESTORE commands in (can be part of DatabaseOperations class).

Implementation

add DatabaseOperations.backup(backup_name, queryset) which saves <south_migrationhistory.id>_<migration_name>_<backup_name>.json in some predefined directory. Queryset param will be explained later

add additional output to migrate --list - which lists all backups that are in directory for each migration

add DatabaseOperations.restore(backup_name) which loads <south_migrationhistory.id>_<migration_name>_<backup_name>.json to database back

Method backup has param queryset, which should enable partial backup. Queryset is django.queryset object with optional definition of list of columns to be backuped (i.e. exported). Note that primary key must be allways included. Example:

I want to migrate model Person : column name -> first_name, last_name. Therefore I want to do backup table Person before actual migration

I can do backup in two ways:

the whole Person table with Person model definition - for full table backup

from full table backup - the whole table is erased - recreated and loaded again

from column table backup - one by one record is updated by primary key - it updates only columns backuped (name, first_name, last_name). This kind of restore doesn't handle alter table commands.

In column table restore there is an issue with eventual with new records added to table. Restore procedure should put db to the same state where it was before the backup. That means that new records in this limited backup mode should be deleted in restore procedure. How to detect them? We have list of id-s to be updated, all records that are not updated (pk not in pk_backuped_list) should be deleted.

What with very large backups - how to avoid transaction overload. When you want to do update on 1.000.000 records then probably this can't be done in one transaction. That is why restore needs to do commit from time to time (db2 import has commitcount option). This should be optional parameter in restore procedure.

I think that restore should be callable from shell too (e.g. migrate --restore or just restore).

My first impression is that this issue can become very complex, but if this is to be implemented, then I suggest for the first release to be implemented as simple as possible, but stable.

Change History

This is one of those features that is useful, but not terribly necessary - while, yes, it will save people some time, it'll take ages to develop, and if you're migrating a production server you should have a backup anyway (so if it fails, you can revert quickly).

I'd much rather solve this issue with education about the merits of backups for now, and possibly include it in the future. To this end, I'm marking it as "The Future".