Note: While you should also be able to set up a similar environment on your Mac or PC without needing a Virtual Machine, the course staff will not support such configurations - so you’re on your own if you choose to go that route!

Follow the instructions available here to install VirtualBox, and the virtual machine image. The virtual machine image already includes most of the software necessary to run the code. We will install extra packages below.

Start up the machine. Enter ‘saasbook’ as the password.

Launch a terminal. (Third icon in the launcher on the left.)

Run sudo bash setup.bash. Make sure you are in the same directory as setup.bash before executing this command. Enter the same password again to install lots of packages.

Grab a coffee or something - it will take a few minutes to build/install these components. Also if you see warnings etc. on the screen, don’t worry that is expected.

Once IPython 1.0+ and Julia 0.2 is installed, you can type julia in the command line to start the Julia REPL, and install IJulia with: Pkg.add("IJulia").

Continue the installation with Pkg.add("PyPlot") to install PyPlot, which is a plotting package for Julia based on Python's Matplotlib.

If the commands above returns an error, you may need to run Pkg.update(), then retry it.

To start the IJulia interface, run ipython notebook --profile julia (a window will open in your web browser).

Execute the code below block-by-block. For example, there's a good chance the code in part n will not run without first running the code in part a.

If you are new to Julia, I recommend downloading and playing around with the language. The syntax is a mix of the best of Python and Matlab, and is very easy to get used to. Speaking of Matlab, Julia is 1-indexed instead of 0-indexed like Python and other languages

# Returns the number of heads in n flips# rand() generates a random number between 0 and 1, that number is then rounded to either 0 or 1. # By summing them up, we essentially are counting the number of headsflip_coin(n)=sum(round(rand(n)))

Part d: As the number of tosses increases, the distribution moves closer to a normal curve

Part e: Since we are working on the same scale, the histogram gets bigger as the number of tosses increases, since the number of heads will also increase

Part f: The curve becomes tighter and tends toward the middle as the number of tosses increases. This makes sense because as we toss more coins, the chance of getting all heads or all tails decreases significantly comparing to say, when we toss 2 or 4 coins.

The optimization trick we're using here is based on the fact that a run of length 1000 contains a run of length 500 as a prefix

Hence, we can use earlier values that we computed for smaller number of coin tosses in computing the number of heads in larger number of coin tosses

In [33]:

identity(x)=x# runs and tosses are the number of runs and tosses respectively# q is the threshold which we're interested in finding the fraction of heads smaller than# func is the function we want to apply to the vertical axis values# For this part, that function should do nothing (identity). For next part, it's the log functionfunctionmulti_toss_runs(runs,tosses,q,func)runs_table=Dict()fork=1:tossesruns_table[k]=Float64[]endfor_=1:runsheads=0fork=1:tossesheads+=round(rand())[1]push!(runs_table[k], heads)endend# Computes the frequency of the fraction of heads is less than q# Code explanation: for every run whose fraction of heads is less than q, give that run a 1, otherwise 0# Then sum up all the values to get the number of runs that satisfies this requirement, and divides# it by the total number of runs. Repeat for each of the k tossesreturn[func(sum([heads/k<=q?1:0forheads=runs_table[k]])/runs)fork=1:tosses]end

The curve is more straight as k increases, i.e. there's less deviations across runs

In [10]:

firsts,seconds,thirds=Float64[],Float64[],Float64[]figure(figsize=(5,3))title("Ratio of heads in k tosses as a function of q")fork=kscurve,first_quartile,second_quartile,third_quartile=q_curve(10000,k)plot(curve,1:10000)# Save the quartile valuespush!(firsts, first_quartile)push!(seconds, second_quartile)push!(thirds, third_quartile)end

The distance between the 0.75 marker and the 0.25 marker is just twice the distance between the 0.75 and the 0.5 marker, since they are symmetric

We observe that the slope of the line in the log-log scale is approximately 0.5 (it moves two powers of x before changing one power of y), which implies that in a normal scale, $y \sim \frac{1}{\sqrt{x}}$

Hence, if we want to scale the gap, we first subtract 0.5 to remove the symmetry, scale by the square root of k, and then add 0.5 back to the final result to adjust for the subtraction earlier

Most points on all of the q curves now align perfectly to each other. Let's translate the above to code!