Improve inside of loops

This is essentially the same code, but cleaned the first two functions.

larger1 = function(pair) pair[1] > pair[2]

swap_if_larger1 = function(pair) {

if(larger1(pair)) rev(pair)

else pair

}

swap_pass1 = function(vec) {

for(i in seq(1, length(vec)-1)) {

vec[i:(i+1)] = swap_if_larger1(vec[i:(i+1)])

}

return(vec)

}

bubble_sort1 = function(vec) {

new_vec = swap_pass1(vec)

if(isTRUE(all.equal(vec, new_vec))) {

return(new_vec)

} else {

return(bubble_sort1(new_vec))

}

}

Improve outside loops

I cleaned the outside loop, then decided that the two inside functions were not needed any more, because it was reduced to one statement.

swap_pass2 = function(vec) {

for(i in 1:(length(vec)-1)) {

if (vec[i] > vec[i+1] ) vec[c(i,i+1)] <- vec[c(i+1,i)]

}

return(vec)

}

bubble_sort2 = function(vec) {

new_vec = swap_pass2(vec)

if(identical(vec, new_vec)) new_vec

else bubble_sort2(new_vec)

}

No recursion

This was actually not intended a speed update, but R was complaining about too deep recursion. The same swapp_pass2 as before is used.

bubble_sort3 = function(vec) {

new_vec = swap_pass2(vec)

while (!identical(vec, new_vec)) {

vec <- new_vec

new_vec = swap_pass2(vec)

}

new_vec

}

A different setup

The way I understood bubblesort is different from what is programmed so far. So far the sort is completed when no improvements are made, which is checked via a vector comparison. I always understood you don't check, the first bubble goes to the end, at which point the last element is the maximum. The second bubble then stops at end-1 etc. The final bubble is only the first two elements, after which the algorithm is finished.

swap_pass4 = function(vec,iend) {

for(i in 1:iend) {

if (vec[i] > vec[i+1] ) vec[c(i,i+1)] <- vec[c(i+1,i)]

}

return(vec)

}

bubble_sort4 = function(vec) {

for (iend in (length(vec)-1):1) vec <- swap_pass4(vec,iend)

vec

}

Tuning Paul's original algorithm

I was a bit disappointed with the improvements from using the true bubble sort. Apparently there is some gain by checking if the process is completed. Which is logical since the bubbles do some sorting of the intermediate elements. So, what about bubbles that go up and go down combined with a check on no improvements?

swap_pass3b = function(vec) {

for(i in length(vec):2) {

if (vec[i] < vec[i-1] ) vec[c(i,i-1)] <- vec[c(i-1,i)]

}

return(vec)

}

bubble_sort3b = function(vec) {

new_vec = swap_pass2(vec)

while (!identical(vec, new_vec)) {

new_vec = swap_pass2(vec)

vec <- swap_pass3b(new_vec)

}

vec

}

Vectorizing the bubbles

When you look at the bubblesort without checking, it seems it is only assumed that the first step brings the highest element at the last position. This can be achieved by just pulling the highest element and placing it at the end, without the intermediate swaps.

bubble_sort5 = function(vec) {

wm <- which.max(vec)

vec <- c(vec[-wm],vec[wm])

for (iend in ((length(vec)-1):2)) {

wm <- which.max(vec[1:iend])

vec <- c(vec[1:iend][-wm],vec[1:iend][wm],vec[(iend+1):length(vec)])

}

vec

}

Results

First a demonstration that they work;

test_vec = round(runif(10, 0, 100))

cbind(

bubble_sort(test_vec),

bubble_sort1(test_vec),

bubble_sort2(test_vec),

bubble_sort3(test_vec),

bubble_sort4(test_vec),

bubble_sort3b(test_vec),

bubble_sort5(test_vec)

)

[,1] [,2] [,3] [,4] [,5] [,6] [,7]

[1,] 28 28 28 28 28 28 28

[2,] 41 41 41 41 41 41 41

[3,] 42 42 42 42 42 42 42

[4,] 46 46 46 46 46 46 46

[5,] 52 52 52 52 52 52 52

[6,] 68 68 68 68 68 68 68

[7,] 89 89 89 89 89 89 89

[8,] 93 93 93 93 93 93 93

[9,] 98 98 98 98 98 98 98

[10,] 100 100 100 100 100 100 100

Speed

Now the timing. Cleaning the outer loops saves a third of the time. The bubble as I remembered it and the up-and-down bubble improve a bit. We are now at 13% of the original time. The final version, which uses the vectorization, blows them all out of the water.

test_vec = runif(1000, 0, 100)

system.time(bubble_sort(test_vec))

# user system elapsed

# 25.48 0.02 25.69

system.time(bubble_sort1(test_vec))

# user system elapsed

# 17.88 0.00 17.91

system.time(bubble_sort2(test_vec))

# user system elapsed

# 4.75 0.00 4.75

system.time(bubble_sort3(test_vec))

# user system elapsed

# 4.75 0.00 4.74

system.time(bubble_sort4(test_vec))

# user system elapsed

# 3.45 0.00 3.44

system.time(bubble_sort3b(test_vec))

# user system elapsed

# 3.57 0.00 3.59

system.time(bubble_sort5(test_vec))

# user system elapsed

# 0.06 0.00 0.06

How about sort()?

Sort is much faster and scales much better, even more so after subtracting the time for the sample() call.. But I don't need to do the exercise to know that a not optimal algorithm in an interpreted language cannot compete with modern algorithms in a compiled language.

No comments:

Post a Comment

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.