Tuesday, May 1, 2012

Red-Black versus AVL: benchmarks

The results of the benchmarks below are thus because of the fact that my computer does not have enough RAM to take advantage of the amortized constant time of the red-black balancing compared to the logarithmic time of the AVL balancing. The red-black balancing functions are longer, and so the height needs to be very large for it to catch up to the AVL and surpass it. See my blog post on the issue here. In short, for small amounts of data, AVL trees may be prefered. For larger amounts of data, red-black trees work better for insertions and deletions in most cases.

A few months ago, I was on Wikipedia1 doing research on AVL trees. It stated that red-black trees beat AVL trees when manipulating the data a lot, but AVL trees are better for look-up-intensive operations. Okay, it may have had a "[citation needed]" out to the side, but I assumed that they knew what they were talking about. Then I compared my final AVL tree to std::set (a red-black tree in my case), and my AVL tree beat the socks off of it! I figured it had to do to the fact that I was inserting the data in order. Maybe red-black trees are better when inserting data in random order. I had to test this, so I spent the next month studying red-black trees, and then it came time for coding one and testing a few things.

Hypothesis:

AVL trees beat red-black trees when searching and when inserting or erasing data in order, but red-black trees beat AVL trees when inserting or erasing data randomly. This is my hypothesis.

Test:

The code below is used to test the speed of red-black trees compared to speed of AVL trees. It tests the trees on nine different amounts of data. It is in C++11, and I suggest that it be compiled with level two optimization.

The AVL tree is based off of code I made a few months ago, and the red-black tree is based off of the code in my red-black tree tutorial2.

The above is kind of messy, so I have put the data into a graph to make it easier to see what happens.

Analysis:

Okay, the results are not even close to what I expected! Why does the AVL tree beat the red-black tree when manipulating random data? And for ordered data, though it is not immediately apparent by the graph, analyzing the data shows the the AVL tree again wins.

Because the results obtained may be due to the fact that I am much better at coding AVL trees than at coding red-black trees, I ran another test to confirm the accuracy of the results. I have an AVL tree that I coded a few months ago (I do not provide the source), and I tested that against std::set using the same method. The AVL tree again won.

The results of the operations on ordered data are too close together for comfort, so that needs tested again with larger numbers.

Re-test ordered data:

In the code, I have commented out the tests on random data, and I have changed the INTERVAL and SQRT_MAX_RAND constants.

As you can see, the AVL tree even beats the red-black tree for ordered data in all cases (though only slightly for searching). Again, I reran this test for my old AVL tree and std::set, and I got similar results.

Conclusions:

Per all of the above data, the AVL tree beats the red-black tree in every area tested. I fear, however, that my results may be incorrect. First, no sources (including my hypothesis) have stated that AVL trees always beat red-black trees. Second, my compiler3 uses a red-black tree for its std::set implementation; why would it do that if AVL trees are superior?

While my tests have shown that AVL trees are better than red-black trees in inserting, deleting, and finding data either in order or randomly, I believe that more tests and studies need to be done to conclusively state one way or another where AVL trees are superior to red-black trees and vice-versa.

I spent the day running tests. I tested the trees above along with my personal AVL and red-black trees, std::set from g++ 4.6.1, and a third-party's tree. I tested inserting in ascending order, inserting in descending order, inserting using rand(), inserting using the random number generator above, and inserting using a random number generator that guarantees every number to be unique. I tested with data sets in the tens of thousands, hundreds of thousands, and millions. I inserted elements, deleted 90% of the elements, then inserted them back a few times. I tried different compiler optimizations. Heck, I even tried allocating memory for the trees beforehand so that memory allocation wouldn't be in the equation! Oh, and I made darn certain there was no noise, and I ran the tests several times.

In every single case, the AVL trees outperformed the red-black trees. The closest that the red-black trees ever came to beating the AVL trees was after inserting hundreds of thousands of unique random numbers (the results for this were too close to comfortably call the winner definitely). At least we know now that for up to five million nodes, the AVL tree is faster than the red-black tree.

I would like to thank #algorithms on freenode (especially wefawa) for taking the time (quite a bit of time) to consider this problem and help think up tests to run. There is one test that was not run because it was not figured out how to do it (try to maintain a Fibonacci tree), but I do not believe that it is likely to be found in any applications. The next step is to go beyond the 1 GB limit of my computer, and that requires some math. I will post again soon with the final verdict.

About Me

Programmer, talks to self, crazy thoughts, terrible at song-writing and singing though I would never agree with that, odd musical tastes, embarks upon extremely odd adventures, thinks bamboo caterpillars are tasty, and went to TCHS, NEMCC, and MSU.