Creating data frame row-by-row in R

When manipulating data in R, I often find myself in a situation where I have to create a new data frame in an iterative row-by-row way. There are approaches to do it this way, but a natural question is which one of them is the best, or more specifically, which one is the fastest?

To answer this question, I checked experimentally how different approaches fare on data frames of different size. Below, I present couple of methods along with a sample code that creates a data frame with n rows for each method.

Methods

The first method, which I called “one by one”, uses rbind to add a new row to the data frame.

In the third method called “preallocated”, we create a new data frame with an appropriate number of rows and then fill each row in consecutive steps. The initial values of each column are the default values of data types used.

Results and conclusions

The results of a simple performance test are presented in the figure below. The source code that executes the test and produces the plot is here: test.R

Creating data frame on a row-by-row basis using different methods.

As can be seen, both “preallocated” and “preallocated with NAs” methods are the fastest. Their drawback is that you have to know upfront how many rows the constructed data frame will have. Next, as the speed is concerned, is the “from list” method which seems to be a sensible choice when you don’t know the number of rows of the data frame. The slowest one is the “one by one” method.

I read a lot of interesting articles here. Probably you spend a lot of time
writing, i know how to save you a lot of work, there is an online tool that creates
readable, google friendly posts in minutes, just type in google
- laranitas free content source