Reading Excel file using Python XLRD

There are many usecases where we need to read data from an Excel file. Simple example is to dump the data in Excel file into MongoDB. My use case is, I want to read test cases from Excel file and run those cases with Selenium Automation framework. Python XLRD module helps in reading the data from Excel file. This article explains how to use and get data from the Excel file using XLRD module in Python 3. Let’s get started.

Install XLRD

XLRD can be installed using PIP installer provided along with the Python installation. On Windows, PIP tool is available in “C:\Python35\Scripts\pip3.exe”. Check your install folder to get the correct path. You can use the below command to install XLRD.

C:\Python35\Scripts\pip3.exe install xlrd

What we read from Excel file?

We all know the format of Excel file, but for the sake of understanding I will provide the information here. Below are the typical data fields that we would like to read from an Excel file:

Number of sheets and Sheet names

Rows and columns information

Cell data and its type

Now we will get into the coding part to read all these data fields from Excel file. To get started we need to have an Excel file. I downloaded a sample Excel file (SampleData.xls) from here. Below is the image snapshot of the file. The Excel file contains two sheets, “SalesOrders” and “MyLinks”. We will use the data in the first sheet. There are 7 columns in the first sheet:

OrderDate – Date field

Region – Text field

Rep – Text field

Item – Text field

Units – Integer

Unit cost – Double/float

Total – Formula field applied on Units * Unit cost

Open a workbook and get sheet information

The below code snippet will open the Excel workbook and print number of sheets and names of the sheets.

A workbook can be opened using “open_workbook” API provided the file path. This API will create a book object. “nsheets” will give you the number of sheets available in the workbook and “sheet_names” API will give the list of sheets.

Reading rows and columns in a sheet

import xlrd
# Open the workbook
book = xlrd.open_workbook("SampleData.xls")
# get the list of sheets
sheets = book.sheets()
# print number of rows and cols in first sheet
print (sheets[0].nrows)
print (sheets[0].ncols)

The above code uses “sheets()” API to get the list of sheets as sheet objects. “nrows” and “ncols” will print the number of rows and columns in a sheet. The two “print” statements print the number of rows and cols in first sheet.

Reading data in rows and columns in a sheet

Now its time to read data in the rows and columns. We can iterate through rows and columns to read all the information in the sheet. The contents of a cell is provided by the “Cell” object in XLRD. The other most important thing we need to know is about the data format, which is provided by the “ctype” attribute in the “Cell” object. “ctype” is an integer values which is mapped to the below table:

Based on our input Excel file the first column is date so when we read the “ctype” of the cell in first column we should get 3. Below code snippet prints the “ctype” value of row 1 and col 0. We need to ignore the first row as it has headers. Similarly, we can find the types of each cell and assign them to appropriate Python type.

Let’s iterate through all the rows and find the total amount of sales available in the sheet.

import xlrd
# Open the workbook
book = xlrd.open_workbook("SampleData.xls")
# get the list of sheets
sheets = book.sheets()
total = 0
for row in range(1, sheets[0].nrows):
total = total + sheets[0].cell_value(row, 6)
print (total)