The software development department of my company is facing with the problem
that data migrations are considered as potentially dangerous, especially for my managers.

The background is that our customers are using a large amount of data with poor quality.
The reasons for this is only partially related to our software quality, but rather to the history of the data:
Most of them have been migrated from predecessor systems, some bugs caused (mostly business) inconsistencies in the data records or misentries by accident on the customer's side (which our software allowed by error).

The most important counter-arguments from my managers are that faulty data may turn into even worse data, the data troubles may awake some managers at the customer and some processes on the customer's side may not work anymore because their processes somewhat adapted to our system.

Personally, I consider data migrations as an integral part of the software development and that data migration can been seen to data what refactoring is to code.
I think that data migration is an essential for creating software that evolves.
Without it, we would have to create painful software which somewhat works around a bad data structure.

I am asking you:

What are your thoughts to data migration, especially for the real life cases and not only from a developer's perspecticve?

Do you have any arguments against my managers opinions?

How does your company deal with data migrations and the difficulties caused by them?

That's not necessarily an "or" question.
–
David ThornleyFeb 25 '11 at 21:46

1

The one argument I have to add is: It's not going to get any easier in the future. If they don't want to undertake the migration now, then they should at least take on a 'data cleaning' project to write some code to identify problem records in the existing system.
–
Michael KohneOct 5 '12 at 12:05

9 Answers
9

Data migrations are my bread and butter and data cleansing is indeed a hugely important matter. One strategy we use do migrate 100% of our customer's data is asymptotic data cleansing pre-migration tools.

Checking data consistency after the migration has happenned. This helps to make GO/NOGO decision on D-Day.

In the end a data migration is an immensely beneficial exercise that has to happen after 3 to 5 years.

It allows to boost the platform's ability to support business.

It allows to streamline the database.

It prepares the IT platform for next generation business tools (ESB/EAI, Portals, Self-Care platforms, reporting and data mining, you name it).

It reorganises DIY data flows between platforms that have accumulated over the years in a quick and dirty "temporary" way to fulfil "urgent requirements".

Above all it empowers the IT production team who come to know their platform better and foster 'can-do' attitudes. These kinds of benefits are difficult to measure but when you come to know many clients, this consideration becomes obvious. Companies shying away from migrations remain in the following tier, bold ones lead the pack.

It's a little bit like when the basement of your house becomes cluttered with lumber. One morning, you have to take everything out and put back only the things you need and throw the rest away. After that, you can use your basement again ;-)

Another fundamental consideration is that nowadays, customer expectations are always on the move, as in "customers are always more demanding". So that there will always be a significant proportion of a given company's competitors on the lookout for these new trends with the obvious intent to increase their market share. The way they will do so is by adapting their offering to stick to the trend or even drive the trends, and that entails constant business re-engineering. If your IT platform is too rigid, it will be a drag on your own aptitude to spouse or precede the market trends on your own side and, ultimately to maintain your own market share. In other words, in a moving market inertia is a recipe for irrelevance.

In contrast, a data migration to a newer system will roll out a more modern and more versatile productivity tool, making the best of newer technologies, more attractive to employees and this in turn, will contribute to support or even lead the company's internal innovation process, thereby securing or increasing its relative market share.

The considerations above actually answer only half of the question asked in the title "Data Migration - dangerous or essential". Yes Data Migrations are essential, but are they also dangerous ? On this account, many things in IT are dangerous then. By definition, anything where the stakes are high is dangerous; especially if you do not take the matter seriously. But this is actually the most common pattern in IT. Not taking data-centres or high availability or disaster tolerance seriously is dangerous.
Does that mean that today's companies should opt out of these pillars of today's Information Technology landscape ? Surely not !

To make your point jokingly, you could argue that "Flying is dangerous if you don't use a plane made by professionals". It's the same for Data Migrations. When executed and conducted by professionals, it is no more dangerous than flying in a well designed and well operated plane. And ROI is in the same proportion compared to terrestrial means of transport.
When entrusted to professionals, most migrations are well controlled successes and the failure+abandon rate is extremely low.

Your managers should be led to ask themselves "Whilst most companies go through Data Migration projects successfully what would make our company so different that it would instead experience a failure ? and can it fare well without one ?"

As reflected by @Alain's answer, one of the reasons for your manager's approach is that data migration is, in itself, a major project, with all the attendant risks of such. Also there are risks specific to data migration - the only data migration project I have been involved with achieved a 98.6% success rate in cleansing the data. This sounds quite good, until one realises that the failure rate left 600,000 customer records to be manually resolved. This involved setting up a separate department and checking and validation processes. Again this was not cheap or risk free.
–
Chris WaltonFeb 22 '11 at 5:07

@Chris. We aim at 100% and I've achieve that at least once. Most of the time customer left aside and manually recreated are less than a dozen.
–
Alain PannetierFeb 22 '11 at 6:32

3

@Alain - congratulations. The project I was referring to was aiming at 100% but it turned out that this was unachievable. The bulk of the data that required manual cleansing turned out to require manual checks of the form "of the three John Smiths we have recorded at this address, how many are distinct individuals?" This particular data migration was from non-RDMS persistance to a RDMS; and implied cleansing data that had accumulated over a period of up to 25 years.
–
Chris WaltonFeb 22 '11 at 6:58

2

And the professional should be a data migration specialist (or at least a data specialist) not an application programmer. Companies get into trouble because they ask data amateurs to do this stuff rather than data professionals. Same thing with all too many database designs.
–
HLGEMFeb 25 '11 at 21:47

1

As an evolving platform, "migrations" or bulk imports are necessary. To emphasize a counterpart there are are also high costs in maintaining a legacy data structure and extending it ad infinium. The bad data that becomes worse data is a context problem that emerges and actually adds significant customer value, because now they know with greater certainty which data they can rely on and which they can't (in the scenarios of concern -- in some scenarios it won't matter and will be of neutral value).
–
JustinCMar 1 '11 at 5:16

Alain gave great answer in terms of importance of data cleansing for successful data migration project and rationale behind doing data migration at all. I would like to target only specific concern your manager has.

In my opinion it's not question of whether to do data migration or not, it's about when to do it. Your manager has absolutely valid point saying that your data is not just yours anymore and end-customers has built their procedures around it already. However this state won't change in future. Sooner or later poor data quality will become inevitable factor of slowing your business down and you will be forced to do migration. Doing this under pressure and with tight deadlines might lead to suboptimal decisions. Besides, think about expertise that you have now and will have in 2-3 years from now. What if people that understand your data will leave company? Are you sure that documentation you have is adequate?

Maybe doing migration now is not necessary but your manager at least needs to have a vision for when exactly migration will be done.

I was work for an insurance company and involved in data migration for the core system. Well, there were in total 4 times. So, here my comments:

In my case, data migration is a must, since by regulation we must keep the data for at least 10 years, and we cannot afford supporting dual system in long term. The other reason is users expect they can continue their work with the new application. If they cannot find the item they work at, your application is bad, and it even worse when the data is not correct.

Well, data migration is a horrible beast and it is real, so face it. It is risky but can be minimized by addressing it earlier and carefully. As a guide, there are four big processes that should be take into consideration in data migration:

Data mapping. Maps of master (and their combination) to the new system

Data clean up. Maps of exception in the data, that is, data whose combination is considered invalid on the new system. If possible, deal with business to exclude data which have no way to be mapped and potentially break the new system, and prepare workaround

Actual data migration. The are many strategies to perform data migration. For example: big bang, incremental

Report consolidation. Should both system run in parallel, how to produce correct and consistent report

Event with careful plan, shit happens! A special task force should be ready to deal with problems related to migration.

I worked in astronomy, we have data (on photographic plates) going back 130years , giving us a Y1.9K and Y2K problem simultaneously. We also have data on tapes from before people had agreed on how many bits were in a byte
–
Martin BeckettMar 1 '11 at 17:09

1) What are your thoughts to data migration, especially for the real life cases and not only from a developer's perspecticve?:

Migration is essential part of systems development. If you partially or entirely replace old systems, migration is a fact of life whether management wants it or not. If existing data is bad, it will reflect badly on your new system. Thus, it is of huge importance to have a good migration strategy.

2) Do you have any arguments against my managers opinions?

Yes, migration is risky, but it is also a fact of life, so deal with it. And deal with it as early as possible.

3) How does your company deal with data migrations and the difficulties caused by them?

My company has - with increasing success involved the custumers actively in the migration process. We review existing data as best we can in the very beginning steps of the project, and encourage the customer to improve data quality before we begin migrating. Sometimes we actually demand it.

4: Any other interesting thoughts which belongs to this topics

My advice is to divide the migration process in two steps: Conversion and Data cleaning.
Conversion is fairly straight forward - a matter of mapping old system objects to new the new system. Data cleaning on the other hand can be a very tricky thing (as mentioned above). Get the customer involved as much as possible, and get the process started as early as possible. Keep in mind that bad data will reflect badly on your new system - sometimes completely without reason. When a new system doesn't work, a customer will rarely blame data that seemed to work just fine in the old system.

If the data you plan to migrate is currently bad, it needs to be fixed whether you do a migration or not. Bad data = useless data.

Migrations are risky, that is true. But so is every major IT project. There are ways to mitigate risk and they should certainly be planned up front in a migration.

First, you should always have a way to go back to the system as it is now. Second migrations should be done on test servers that are set up just for the migration. It is foolish to do a migration without the ability to test it first. Third, all code for the migration should be in source control.

Fourth, you need requirements and test plans before you start the migration. You need to know that if you had 1,293,687 records in the old system, that you have the same in the new or you know where they went (to an exception table perhaps). If you are normalizing a denormalized scheme, you need to calculate how many records you should end up with before you start and then check that. You need documentation that specifies what the mappings from one system to the other are. This will help your QA people check to see that the data went to the right place.

You need to determine how to handle the current bad data. What can be cleaned, what might need a value in a required field that says 'Unknown', what should be tossed out to an exception table, what needs manual intervention by a group of users (deciding if these two people are really a dup or are there two doctors in that practice with the same name for instance and if it is a dup which data to choose when the two records differ, etc.).

The key to a successful migration is planning. I have found that planning (which includes writing the test cases and unit tests) usually takes more time than the actual development.

The next key to a successful data migration is QA. This is not a project to throw at the QA team the day before launch. This is not a project to launch when QA says there is a problem.

Another key to a successful migration is to deploy the majority of the data and test it while the orginal system is still running. If you are moving lots of records this could be time-consuming and new changes will happen. So your process must be able to pull the data changes after the migration starts as well. SQL Server for instance has something called Change Data Capture which can help with this. You can take a backup of the orginal system and turn on change data capture at the same time. Then you can resotre the backup to your migration server, test the migration, get the majority of the data loaded and then you only have to load the records that have changed. When you migrate the final records, turn off the source system until the migration is done. This is one reason to migrate the majority of the records ahead of time, so the application is down the least amount of time. Choose your migration time well, don't shut the payroll sytem down the day they should process payroll or send out W2s. And do it during the low usage hours. If you have multiple clients, you could consider migrating one first and making sure all is good before doing the others. It's a whole lot easier to rollback one customer's data than 10000 if there is a problem. But plan this carefully if you do it.

If the migration involves a new user interface, please get the actual users to use it as part of the migration testing. Then train the other users before you go live (but less than a week before you go live or they will forget). Have the users involved in testing help design the training, they know what questions they had and what the people need to know in what order. Get their input, making a field required because you think it should be won't help if the users usually don't have that data when they enter the records. They will just put junk into the newly required field becasue they can't get the data in
otherwise.

Look at what is wrong with the current data, can you add foreign keys, constraints, triggers, business rules inthe application, default values, etc. in order to avoid this being bad in the future? When you clean bad data, you also need to create a way to avoid that simliarly bad data getting in in the future. Analyze why the bad data was alloed and fix the holes inteh design.

Data migration is a necessity. Without data migration you often can't go forward. Many systems I have worked with required history only available from prior systems. Migration is the only practical method of doing this. Data quality is often an issue. Generally, this should be dealt with in the prior system. This may require changes to the data to regain quality.

Other systems I have worked with depended on data from other systems. This is a different but significant issue. In some cases the data can be replaced entirely. Other cases may be better handled by merging the changes included in the new data into the existing set. These types of migrations should include validity checks for the incoming feed.

The ability to validate and clean existing data can be an important feature of a system. This is independent of migration. There are often mechanisms to modify data which are outside the control of the system. This can cause data to become invalid. Other data problems result from bugs in the application. Running the validation routines periodically can help identify the problem and allow the data to be cleaned before it is time for migration. As has been noted cleaning the data early can make the migration easier.

Some validations are time sensitive, and should not be applied to data which has not been modified. This is common with coded values, where codes have been retired. It should be possible to change other fields in the record without triggering validation errors. This can make the update validation more complex as it needs to identify which fields changed before validation. Cross field validation may also be more complex. The ability to treat some records as read-only can help in this case as the validation can be avoided.

On one I system I worked on, the new system was partially rejected by the customer. They refused to allow the new data entry modules to be used. However, they wanted the batch processing from the new system. The solution was to migrate the data nightly prior to the batch run.

It's a necessary evil. I've been on both ends and these are some of the other issues that compound the problem.

Expecially in the enterprise, when comapnies go to a new system, they want it to do everything the old system did. They don't review their procedures. They are so overwhelmed they just want to keep doing everything the same way. This is safe to them.

They don't take the time to learn the new system or hire people with expertise.

They want to customize the new system to either accomodate #1 or to handle some new aspect of their business. New System X Customizations X Converted Data = Compounded Complications

Not enough time is dedicated to testing.

Customers hate running in parallel / doing things twice. Can't blame the users because they are not given the time to do this since all of their other duties are kept on at full blast.

If your managers can justify the loss of sales by not converting data, more power to them. Telling your customers that all data conversions fail isn't going to work because someone else will always tell them it will (Usually your competition.).

What are your thoughts to data migration, especially for the real life cases and not only from a developer's perspecticve?

software has to be upgraded regularly. to make sure the migration is save, you need backup and testing.

Do you have any arguments against my managers opinions?

he is right that it is risky. but you can adapt techniques to make it less risky.

How does your company deal with data migrations and the difficulties caused by them?

we have daily backup, incremental backup, backup before every deployment to production. which at least let you rollback if anything bad happens.

we have testing environment, automated tests and daily build server. also a smoke test procedure to make sure major operations and functions are working properly. We involves developers, QA and users to test the build (which has data migrated).

we are using ruby on rails, which provides versioning of data migration, upgrade and rollback. which makes our life easier.

we are using capistrano to execute the code update and data migration.
keeping migration automated and simple is one of the key thing to make sure production system works.

Any other interesting thoughts which belongs to this topics?

another concern regarding data migration to me is the consistence of code upgrade and data migration. in my case, again, we are using automated ways to handle that. and always ready to rollback.

executing data migration manually may turn the database into unknown status. and it is hard to compare the data migration version between different server environment.

I hope you don't run a hospital: Why do we only have patient records for babies? Well we installed a new system last year and it was too hard to migrate all the old data so we only put new patients on it!
–
Martin BeckettFeb 25 '11 at 5:12

Nope, I don't run a hospital. Read what I said again. "The reward your managers hope to realize had better be extremely high given the cost of the migration." If the reward is high -- whatever that may be -- then it's worth it. Otherwise, it's a waste of everyone's time and an unnecessary risk. Also, I mentioned in my answer that integration can be done to allow the new system to access the old data, in some cases. But this decision depends entirely on the scenario.
–
jmort253Feb 25 '11 at 6:15