A look at R vectorization through the Collatz Conjecture

You may have heard before that R is a vectorized language, but what do we mean by that? One way to read that is to say that many functions in R can operate efficiently on vectors (in addition to singletons). Here are some examples:

Being aware of which functions are vectorized in R and using them can make a big difference when it comes to writing succinct and efficient R code. For example, had you not known that 'paste' is vectorized, instead of the above line of code you would need to loop through each value of your vector, or alternatively use one of the 'apply' family of functions, both of which are shown below.

You can check for yourself that neither of the above two approaches are as efficient as using the 'paste' function in a vectorized fashon.

For a more serious demonstration of what a difference vectorization can make, we will look at an R implementation of the Collatz conjecture. (Check out xkcd for an entertaining presentation.) The Collatz conjecture states that if you start with any positive integer n, and recursively apply the following algorithm, you will eventually reach 1:

if n is even, then divide it by 2

if n is odd, then multiply it by 3 and add 1

The number of iterations it takes for n to reach 1 is called its stopping time. To restate, the Collatz conjecture states that any integer n has a finite stopping time.

Let's try a naive implementation of the Collatz conjecture in R (a better implementation would use memoization, which we will not cover). There are two ways to implement the above in R. Here's the first way:

In the above approach, we write a function 'collatz' which, given a single integer n, will determine its stopping time. Since collatz can only take a single integer as input, we can pass it to 'sapply' to get it to work on a vector on integers. The above implementation is wrapped around in a function which we call 'nonvec_collatz'. Although 'nonvec_collatz' is vectorized in the sense that given a vector of integers as input you get a vector of integers as output, vectorization was achieved through the use of 'sapply' and so it is only aestetically vectorized and not necessarily efficient.

In our second approach, we write a function 'vec_collatz' which is truely vectorized:

vec_collatz <-function(ints){# we store the number of iterations for each number into niter
niter <-integer(length(ints))# while there remains a number that has not yet converged to 1, run the loopwhile(abs(sum(ints -1))>.01){
niter <- niter +ifelse(ints ==1,0,1)
ints <-ifelse(ints ==1, ints,ifelse(ints %% 2==0, ints /2,3*ints +1))}
niter
}

Notice any diffences between 'nonvec_collatz' and 'vec_collatz'? Both functions run a while loop that stops once the number reaches 1. But in 'vec_collatz' the while loop takes advantage of the fact that 'ifelse' is a vectorized function to run the recursive process on the whole vector of integers, instead of one at a time, as is done by 'nonvec_collatz'.

Let's look how much more efficient 'vec_collatz' is compared to 'nonvec_collatz':