Wednesday, June 01, 2011

Strategies for porting data

In every application there always comes a time when data needs to be ported - either from old application to another new application or from one data store to another. You might have changed the database structure to implement some new functionality. Maybe move your data from an SQL to a NoSQL.

The most important tool for moving data from one data-set to another is a porting script. The porting script maps the data from fields of old data-set to the fields of the new data-set. The porting script can contain logic or simple sql.

In a live system where data keeps on coming, it becomes difficult to port data. If data is not handled properly it might lose its sanity. There is a catch while dealing with live systems. It should be considered if the porting of data leads to a downtime or not. Ideally there should be as little downtime as possible. Here when I refer to downtime, it is downtime for both internal and external users.

There are different ways for handling the porting of data.

1. The sure-shot way of porting data without losing any sanity is the bring down the application. Freeze the data set and run your scripts. This technique is ok when you are dealing with small amounts of data. And the porting time is not more than a few minutes - if you can afford a downtime of a few minutes. The good thing about this type of porting is that you do not have to worry about the sanity of data. It is difficult to mess your data when your application is completely offline.

2. Another technique is to move to your new application and use the new data-set to insert new data and select old data from the old data-set for display. This is a very effective way of porting data. All that needs to be done is put in the adapter design pattern (wrapper design pattern) at the point where you are selecting data. Make a note of the D-Day when the new application is made live. And use the adapter pattern to fire selects on old data-set if you need to fetch data previous to the D-Day else fetch data from the new data-set. All inserts would happen on the new data set. This is very effective because ideally all data would slowly move on its own from old data-set to new data-set. If you want to expedite the process, you can have a separate script for porting data older than the D-Day.

3. The third technique is an alternate to the second technique. Instead of putting the adapter pattern at the point of select, you put an adapter pattern at the point of insert. So in addition to inserting into the old data-set, you also insert into the new data-set. All your selects are still fired on the old data-set. Eventually when you are satisfied that data has moved from old data-set to new, you can shift to the new application and start using the new data-set. Here also a script can be run to explicitly port data from old data-set to new data-set.

4. Another variant of the above techniques, is using user access requests to port data. No matter how many tables there are in a system, there is a master table and almost all the tables in the system have their foreign key referring to the master. When data is ported the primary key of the master is kept track of. What this technique does is that it ports data when a primary key is accessed. So for example when a user logs into the system, you check if his data is ported to the new data-set. If not, you port all data related to the user at that instant and move the user to the new system. The bad point about this type of data porting is that if there is a scenario where large number of users suddenly come online - it may become difficult to handle the spike and the users may experience slowness or errors.

Techniques 2, 3 and 4 require some planning and care while execution. But when you are dealing with GBs of data which cannot be worked upon in a few minutes or hours, and you cannot afford a downtime, they are the ideal way to move data. It is really important to remember that even a small issue in moving data can result in losing the sanity. Hence utmost caution and thorough testing is required before you can go ahead and implement these techniques.

In case you come across any other interesting techniques or have used any other technique, do share your knowledge.