1. Introduction

CSV files are extensively used in data interchange between applications. Especially useful when the only structure to the data being exchanged is rows and columns. This format is particularly popular as the data can be imported into Microsoft Excel and used for charts and visualization.

In this article we present an easy-to-use class for parsing and reading CSV data in Java. The class allows retrieval of each row of the CSV file as an array of columns. This row can then be processed further for filtering, inserting into a database, etc.

You might be thinking – Why not just use String.split() to split row data into fields? It is, after all, readily available and returns the data as an array. Well, the answer is, CSV parsing has a lot more nuances than is possible to handle using just String.split().

2. Excel Compatibility

The CSV format defines certain conventions which are commonly used in applications that import and export CSV data. One of the most common applications used for visualizing CSV data is Excel. Many applications including follow these conventions and hence a CSV reader must take these into consideration. Some of these are:

Commas are used to separate fields. And a Carriage-Return Line-Feed (CRLF) combination is used to separate rows.

When commas need to be included as a part of a field value, it must be quoted with double-quotes (").

Multi-line fields can be present in the CSV file and these fields must also be quoted.

Double quotes can be included within a field by repeating the double-quote character.

3. Read and Strip BOM

A CSV file generated from an application on Windows might include a BOM (Byte Order Mark) character at the very beginning of the file. This character, if present, can be used to determine the encoding of the file from among UTF-8, UTF-16BE (Big Ending) or UTF-16LE (Little Endian). The CSV Reader module (presented next) uses a routine to strip the BOM from the CSV file.

5. Read CSV File into an Array

The following code illustrates how to read a CSV file and load the rows and columns into a List. The file is opened with an InputStreamso the Byte Order Marker can be automatically detected and discarded. Also the first row of the CSV file is assumed to be column headers and loaded into a separate array.

Summary

Reading CSV data from a file is sometimes required in an application. A simplistic view would be to use String.split() to read the data, but this does not cover all the edge cases. These include commas within fields, double quotes, and multi-line text. Download and drop in the included class for a simple solution to parsing CSV files.