Ruby file trimming app

We recently had an interesting experience with very large files. These were comma delimited files (.csv) containing hundreds of thousands of records, each with a dozen or so fields.

e.g.

rec1,field2,,,,,,xxx,fieldn,,,1,2,3,,,fieldx

rec2,field22,,,a,s,d,fieldmore,,,,etc

.

.

.

recn,field2n,,,,ring,,,,ring,1,2,,,hello?,,etc

While testing the setup, we had smaller files to work with. The goal was to create a new file containing only the first field from each record.

e.g.

rec1

rec2

.

.

.

recn

During testing this was easily done by opening the file in a spreadsheet program (such as OpenOffice), which would split the records on the comma delimiter and place each field in a different column. Then, it was easy to select the first column and write it out to the new file.

On switching to production files, we discovered that OpenOffice has a limit of 65k rows – a fraction of what we needed. We then tried some other spreadsheet programs, which produced the same results. We knew there was at least one spreadsheet program that would work, but it was not open source.

At this point the comment was made: “well, we ARE ruby developers …”

And that lead to the following simple solution to the problem at hand.

With a few lines of ruby code, the source files could be read in, line by line, split on the comma delimiter, and the first entry written out to the destination file.

So, when the usual tools just don’t work – remember that a new ruby tool might be just around the corner.