Many people still save their data into Microsoft Excel files. This is an unhappy choice for many reasons but many was already written about this topic. Furthermore, unfortunately Excel become a de facto standard in many business environment and this routine seems to be difficult to strike out.

Many solutions have been implemented to read Excel files from R: each one has advantages and disadvantages, so an universal solution is not available. Get an overview of all the solutions, allows the choice of the best solution case-by-case.

Save Excel files into text

Saving Excel files into CSV can be done directly from Excel or through some external tools that allows batch operations. Native R functions for text data import can so be used.

1

2

df=read.table("myfile.csv",header=TRUE)

Copy and paste from Excel to R

This is a fast solutions, but it has one main drawbacks: it requires to open Excel file, select data and copy. By the way, this is the best compromise when you're in a hurry.

1

2

df=read.table("clipboard")

ODBC connection

For many years this was the easiest solutions based on R code for Windows users. Nowadays it still support only 32 bit versions of R and this limit discourage the use of this package. Besides Microsoft Windows and 32-bit R, it requires the Excel ODBC driver installed.

1

2

3

4

5

6

7

require(RODBC)

conn=odbcConnectExcel("myfile.xlsx")# open a connection to the Excel file

gdata package

In my experience the function provided by gdata package provides a good cross platform solutions. It is available for Windows, Mac or Linux. gdata requires you to install additional Perl libraries. Perl is usually already installed in Linux and Mac, but sometimes require more effort in Windows platforms.

1

2

3

require(gdata)

df=read.xls("myfile.xlsx"),sheet=1,header=TRUE)

xlsReadWrite package

xlsReadWrite is reported here for didactically purposes only although it is very fast: it doesn't support .xlsx files and this is not acceptable nowadays. Furthermore, it uses proprietary third party code and it should be downloaded from GitHub, CRAN cannot host it. It is available for Windows only.

1

2

3

4

require(xlsReadWrite)

xls.getshlib()

df=read.xls("myfile.xls",sheet=1)

XLConnect package

XLConnect is a Java-based solution, so it is cross platform and returns satisfactory results. For large data sets it may be very slow.

1

2

3

4

require(XLConnect)

wb=loadWorkbook("myfile.xlsx")

df=readWorksheet(wb,sheet="Sheet1",header=TRUE)

If you want to deepen its features, this article furnishes several examples.

xlsx package

xlsx package read (and write) .xlsx and .xls files using Java. It is cross platform and uses rJava to deal with Java. Comments and examples below are taken from a stackoverflow answer. It probably returns the best results but requires some more options.

However, read.xlsx() function may be slow, when opening large Excel files. read.xlsx2() function is considerably faster, but does not quess the vector class of data.frame columns. You have to use colClasses() command to specify desired column classes, if you use read.xlsx2() function:

read.xlsx("filename.xlsx", 1) reads your file and makes the data.frame column classes nearly useful, but is very slow for large data sets.

read.xlsx2("filename.xlsx", 1) is faster, but you will have to define column classes manually. A shortcut is to run the command twice. character specification converts your columns to factors. Use Date and POSIXct options for time.

1

2

3

4

require(xlsx)

read.xlsx("myfile.xlsx",sheetName="Sheet1")

read.xlsx2("myfile.xlsx",sheetName="Sheet1")

1

2

3

4

5

6

7

8

9

10

11

12

13

14

require(xlsx)

coln=function(x){# A function to see column numbers

y=rbind(seq(1,ncol(x)))

colnames(y)=colnames(x)

rownames(y)="col.number"

return(y)

}

data=read.xlsx2("myfile.xlsx",1)# open the file

coln(data)# check the column numbers you want to have as factors

x=3# Say you want columns 1-3 as factors, the rest numeric

data=read.xlsx2("myfile.xlsx",1,

colClasses=c(rep("character",x),rep("numeric",ncol(data)-x+1))

)

A self made function

Finally, I found on the web a self made function to easily import xlsx files. It should work in all platforms and use XML. It doesn't work with old .xls files. It allows to read more sheets of a file in the same time, using the keep_sheets argument.

Given the problems that each possibility you mentioned has, my experience is that the best way to import data from Excel is using RODBC. It's fast even for bigger data sets and the export I usually do using the write.table function. To import the data into Excel you can use VBA to copy the data directly into your excel table or via link. In the latter case there exists the problem that Excel can't update the links to csv so you have to open and close the csv file each time you want to update the data. Below you can find the code I am using:

The function wrote by schaunwheeler still is slower to read large sheets (e.g. 10K+ rows). The main constraint is to parse the xml document using xpathApply.

I modify schaunwheeler function through directly parsing xml with regular expression (https://gist.github.com/byzheng/6119160). The runtime is reduced from 120 s to 18 s to read a 3M xlsx file (one sheet contained 400K values). And it works for me.

[...] likely still be using Excel as a data source, though, so you’ll also want to check out this guide to importing data from Excel to R from MilanoR. To leave a comment for the author, please follow the link and comment on [...]

I know this post is quite old, but it has a very high search ranking for relevant search terms. I suggest that you might want to update your blog post to strongly recommend against exporting from excel to CSV and from excel to R via the clipboard. Both methods when I last tested them respected Excel's display preferences for the number of decimal points to display. Therefore, if there is any data fidelity beyond what is present on the screen, then you may lose it via a CSV export or a clipboard export.

Commenting on quite an old post, I would like to suggest the readxl package which provides read_excel(). It is quite fast (about 100 times faster than gdata). The only drawback I've found is that it does not seem to be able to import columns directly as integer or factor.