Saturday Mar 02, 2013

Loading CSV text data into a MySQL database table is an art form. Obviously the text data can trip up for a variety of reasons from non-unique keys to missing data columns to invalid number or date formats.
I recently needed to load over 20 such tables from a SQL Server CSV text file dump. The same techniques would work for data from an Excel csv text file too.

To automate the process as much as possible I wrote a quick XSLT utility (called converter-txt-2-sql.xsl) that reads in the CSV text file, examines the first line that contains the table column field names, then analyses the samples by scanning the entire input data lines, and generates a valid CREATE TABLE {name} ( {column(s)}); SQL statement. Then in a second pass it builds the INSERT VALUES ({data}) statement for all the following data lines in the CSV.

It does a pretty good job, about 98% of what you need. You still need to do some manual editing of the CREATE TABLE SQL generated. Essentially it can only guess at the lengths for each column - so you may want to manually adjust those, along with setting the key field column name (it assumes the first one for that), and then if your data is null or unique and so on. But those are quick edits once it has all the basics there for you.

It assumes that each line is one data record in the CSV text file input; so you cannot have multiple linefeeds inside your data lines, only one at the end of each line; a fair assumption most of the time.

Having got it working I was able to load up the twenty tables in less than an hour. There is still room to improve the XSLT logic to handle various edge conditions better, but for the time I invested in writing the XSLT its a fair level of maturity, awaiting the next project to see if it needs more refinement. Plus it is a nifty example of using XSLT to read in a text source file and output text (in this case SQL statements). Note: depending on your XSLT processor (I used Saxon) you may need to feed a dummy xml file in e.g. <dummy/> just to satisfy the processing engine.

I found it worked well to migrate over SQL Server tables quickly into MySQL just working from the raw csv text export files from SQL Server that had been sent to me. It's not completely perfect but it should suffice for proof of concept purposes and quick demonstrations. And you have to know what you are doing, to be able to resolve syntax and data integrity errors. However I was able to load over 50,000 data records well enough.

Of course if you can get a live connection from MySQL to SQL Server then you can use the built-in migration tools MySQL has. This little XSLT utility is useful when you do not have that option. Or if people are using Excel spreadsheets with data tables and you want to convert those over to SQL tables.