Using Python to Parse Spreadsheet Data

When you work on web applications for large organizations and enterprises, I am sure you have faced this unique problem. Organizations often store data in spreadsheets and you need to provide an interface for them to enter this data into your web application. The general idea is to upload the file, read its contents, and store it in either files or databases that your web application uses. You may also need to export data from your web applications. For instance, how would you export the marks of all the students in a class? Again, spreadsheets are the preferred medium.

In this post, we will discuss different ways to handle these files and parse them to get the required information using Python (2.7).

A Quick Spreadsheet Primer

Before parsing spreadsheets, you must understand how they are structured. A spreadsheet file is a collection of sheets and each sheet is a collection of data cells placed in a grid, similar to a table. In a sheet, a data cell is identified by two values — its row and column numbers.

For instance, in the given screenshot, the spreadsheet contains only one sheet, “Query1”. The cell “2A” corresponds to the 2nd row and 1st column. The value of cell 2A is “12510001”.

Although programs with a GUI assign letters to the names of columns, when we parse the data, we will start row and column numbers from 0. That means, cell 2A will correspond to (1, 0), 4B to (1,3), 3C to (2, 2), and so on.

A Quick Summary of CSV Files

CSV stands for “comma-separated values” (or sometimes character-separated if the delimiter used is some character other than a comma) and the name is pretty self explanatory. A typical CSV file looks like the following:

Conclusion

As I mentioned earlier, creating and parsing spreadsheets is inevitable when you are working with huge web applications. Thus, familiarity with parsing libraries can only help you when the need arises.

What scripting language do you use to handle spreadsheets? Does Python have any other library for this purpose? Let us know in the discussion below.

Shaumik is an optimist, but one who carries an umbrella. An undergrad at Indian Institute of Technology Roorkee and the co-founder of The Blog Bowl, he loves writing, when he's not busy keeping the blue flag flying high.