Are you thinking about moving data (e.g., patient files, images, telemedicine survey data) from that clunky old Yugo computer or storage system to a brand new shiny Ferrari system? Are those happy thoughts or are they clouded by dread and visions of digital monsters eating up all your data and laughing in your face? You are not alone.

Data migration can be painful but there are ways to lessen that pain with careful preparation. First let’s start with a few basic terms. Legacy data is what you have now and want to transfer. Data migration is the process of importing that data into the new system. Data cleansing is getting the old data ready to move – making it compatible with what the new system requires or expects. It’s this last part that is often the most difficult.

One of the first steps in this process is to decide on what your goals are. Here are some questions you should be thinking about.

What type of data do you have & what are the uses? (volume vs value)

Should you bring the old data over? All? Some? What are the decision criteria? This is generally a volume vs value question

Is the desired data extractable from existing database?

Are the desired fields importable?

Does data migration from legacy included purchase new system?

If not do you have expertise in-house or should you hire out?

Will the process be automated or manual?

How much to spend - $ & time?

Is this short term or long term?

Are there any legal issues?

Does the data already exist anywhere else (EMR)?

Next consider what type of data you have: spreadsheet entries, images, text (pdf, doc, etc.), dedicated program formatted. You should also consider whether the data adhere to some sort of standard in its present form or will it have to in the future (for example SQL). The problem is that there are lots of data formats and some databases contain multiple formats (e.g., CSV, Dbase2, DIF, Excel, HTML, SQL, SYLK, Text, WKS, Windows metafiles) increasing the complexity of the transfer. Thus it is key to start out knowing what you actually have before figuring out what can be done with it.

What are your options? Well the first one certainly has some appeal – avoid it! Simply leave the old data where it is under the expectation it will likely not be needed or needed only rarely and thus it can stay where it is until that system dies and hope it’s never needed again. Attractive but not always feasible. The next option is to work incrementally. Maybe start moving the most recent (thus most likely to be requested) data over then progressively work back in time until everything has been transitioned over. This can take a while but if time is not that pressing of an issue it has its benefits. Finally there is the option to use existing tools or services – there are companies that do this! As with any service you need to shop around, get the best prices, guarantees, verify the time frame, what happens if data are lost, etc. If you have the resources this is often your best and easiest option.

What are some of the more common problems people encounter?

Single columns/rows used for more than one purpose or data point

Column values are determined based on data in other columns (i.e., formulas are used)

There are relationships between data in different columns that are not explicit

Special characters are used and/or used inconsistently

Default values differ

Hidden data fields

All of these typically can be solved in the data cleansing phase, but they need to be identified before they can be dealt and this is the difficult and often tedious part. The problem is often confounded by the fact that a lot of older databases were set up and used by one or two people but they are often no longer available or have themselves forgotten all the rules and shortcuts they developed!

There is an interesting option for researchers called REDCap (Research Electronic Data Capture) that is available at http://project-redcap.org/. This is a mature, secure web-based tool for building and managing online surveys and databases for research studies. It is available free of charge but is not open-source. You can join the consortium to get the source code with a valid end-user license agreement through Vanderbilt University. It may be necessary to contact your Tech Transfer or General Counsel Office but it is a fairly straightforward process. REDCap can be installed on a variety of environments (e.g., Mac, Windows, Linux, Unix) and is compliant with HIPAA, 21 CFR Part II, FISMA and other international standards.

So don’t let the data migration demons haunt you! It can be challenging but if you utilize available resources, be prepared to invest the necessary preparation time and resources (yes financial too), strategize, plan, and think about how you will ensure that all the data you wanted to migrate made it over (i.e., validation) half of the challenge is already met.

About the Author

Elizabeth Krupinski, Ph.D. is a Professor at Emory University in the Department of Radiology & Imaging Sciences and is Vice-chair of Research.. She is Associate Director of Evaluation for the Arizona Telemedicine Program and Director of the SWTRC. She has published extensively in these areas, and has presented at conferences nationally and internationally. She is Past Chair of the SPIE Medical Imaging Conference, Past President of the American Telemedicine Association, President of the Medical Image Perception Society, and Past Chair of the Society for Imaging Informatics in Medicine. She serves on a number of editorial boards for both radiology and telemedicine journals and is the Co-Editor of the Journal of Telemedicine & Telecare.