Day 21 – Collatz Variations

The Collatz sequence is one of those interesting “simple” math problems that I’ve run into a number of times. Most recently a blog post on programming it in Racket showed up on Hacker News. As happens so often, I instantly wanted to implement it in Perl 6.

This is a very straightforward implementation of the Racket post’s max-cycle-length-range as a stand-alone p6 script. collatz-sequence generates the sequence using the p6 sequence operator. Start with the given number. If it is divisible by two, do so: when * %% 2 { $_ / 2 }. If it is not, multiply by three and add 1: when * !%% 2 { 3 * $_ + 1 }. Repeat this until the sequence reaches 1.

MAIN(Int $min, Int $max) sets up our main function to take two integers. Many times I don’t bother with argument types in p6, but this provides a nice feedback for users:

> perl6 collatz.pl blue red
Usage:
collatz.pl <min> <max>

The core of it just maps the numbers from $min to $max (inclusive) to the length of the sequence (+collatz-sequence) and then says the max of the resulting list ([max]).

Personally I’m a big fan of using the sequence operator for tasks like this; it directly represents the algorithm constructing the Collatz sequence in a simple and elegant fashion. On the other hand, you should be able to memoize the recursive version for a speed increase. Maybe that would give it an edge over the sequence operator version?

This recursive version, which makes no attempt whatsoever to be efficient, is actually better than twice as fast as the sequence operator version. In retrospect, this makes perfect sense: I was worried about the recursive version making a function call for every iteration, but the sequence version has to make two, one to calculate the next iteration and the other to check and see if the ending condition has been reached.

Well, once I’d gotten this far, I thought I’d better do things correctly. I wrote two framing scripts, one for timing all the available scripts, the other for testing them to make sure they work!

This script takes as an argument a string that can be used to call a Perl 6 executable and a list of scripts to run. It runs the scripts using the specified executable, and times them using p6’s now function. It then sorts the results into order and prints them. (A similar script I won’t post here tests each of them to make sure they are returning correct results.)

In the new framework, the Collatz script has changed a bit. Instead of taking a min and a max value and finding the longest Collatz sequence generated by a number in that range, it takes a series of numbers and generates and reports the length of the sequence for each of them. Here’s the sequence operator script in its full new version:

The sharp-eyed will notice this version is different from the first recursive version above in two significant ways. This time I made the argument Int $n, which instantly turned up a bit of a bug in all implementations thus far: because I used $_ / 2, most of the numbers in the sequence were actually rationals, not integers! This shouldn’t change the results, but is probably less efficient than using Ints. Thus the second difference about, it now uses $_ div 2 to divide by 2. This version remains a great improvement over the sequence operator version, running in 4.7 seconds instead of 13.3. Changing when * !%% 2 to a simple default shaves another .3 seconds off the running time.

Once I started wondering how much time was getting eaten up by the when statements, rewriting that bit using the ternary operator was an obvious choice.

Can we do better than that? How about memoization? is cached is supposed to be part of Perl 6; neither implementation has it yet, but last year’s Advent calendar has a Rakudo implementation that still works. Using the last version changed to sub collatz-length(Int $n) is cached { works nicely, but takes 3.4 seconds to execute. Apparently the overhead of caching slows it down a bit. Interestingly, the non-ternary recursive version does speed up with is cached, from 4.4 seconds to 3.6 seconds.

Okay, instead of using a generic memoization, how about hand-coding one?

Like this:

LikeLoading...

Related

This entry was posted on December 21, 2012 at 12:25 pm and is filed under 2012. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

Nice! I don’t see a timing improvement in my testing rig, but it’s right there at the top, and it’s certainly more elegant than my hand-cached version. Do you mind if I add collatz-gerdr.pl to the github repo?

I only wrote it to support testing a single function at a time, but it measures the runtime of the actual function execution, so I’ve found it useful for comparing implementations with better granularity and less dependence on system load.

Kaz is correct — the various hand-cached versions are so fast that the time to calculate the answers is completely swamped by the overhead of setting up the tests. (Proof: I just added collatz-dummy which never calculates a Collatz sequence length at all, it just returns 1 for everything. It’s only a tiny marginal improvement over the best versions which actually do work.)

Conclusion: I need a better benchmarking framework for this. Must ponder.