Package org.apache.hadoop.examples.pi

Package org.apache.hadoop.examples.pi Description

This package consists of a map/reduce application,
distbbp,
which computes exact binary digits of the mathematical constant π.
distbbp is designed for computing the nth bit of π,
for large n, say n > 100,000,000.
For computing the lower bits of π, consider using bbp.

The distbbp Program

The main class is DistBbp
and the actually computation is done by DistSum jobs.
The steps for launching the jobs are:

Initialize parameters.

Create a list of sums.

Read computed values from the given local directory.

Remove the computed values from the sums.

Partition the remaining sums into computation jobs.

Submit the computation jobs to a cluster and then wait for the results.

Write job outputs to the given local directory.

Combine the job outputs and print the π bits.

The Bits of π

The table on the right are the results computed by distbbp.

Row 0 to Row 7

They were computed by a single machine.

A single run of Row 7 took several seconds.

Row 8 to Row 14

They were computed by a 7600-task-capacity cluster.

A single run of Row 14 took 27 hours.

The computations in Row 13 and Row 14 were completed on May 20, 2009.
It seems that the corresponding bits were never computed before.

The first part of Row 15 (6216B06)

The first 30% of the computation was done in idle cycles of some
clusters spread over 20 days.

The remaining 70% was finished over a weekend on Hammer,
a 30,000-task-capacity cluster, which was also used for the
petabyte sort benchmark.

Note that it may take a long time to finish all the jobs when <b> is large.
If the program is killed in the middle of the execution, the same command with
a different <remoteDir> can be used to resume the execution. For example, suppose
we use the following command to compute the (10^15+57)th bit of π.

It uses 20 threads to summit jobs so that there are at most 20 concurrent jobs.
Each sum (there are totally 14 sums) is partitioned into 1000 jobs.
The jobs will be executed in map-side or reduce-side. Each job has 500 parts.
The remote directory for the jobs is remote/a and the local directory
for storing output is local/output. Depends on the cluster configuration,
it may take many days to finish the entire execution. If the execution is killed,
we may resume it by