The customer journey is at the forefront of every discussion about modern marketing. The idea that customers move in premeditated or, at the very least, marketer-meditated paths between well-defined states is alluring (and comforting) to a marketing professional. Of course, even a cursory examination of a web path analytics (Sankey) [...]

Given a rectangular grid with unit spacing, what is the expected distance between two random vertices, where distance is measured in the L1 metric?
(Here "random" means "uniformly at random.")
I recently needed this answer for some small grids, such as the one to the right, which is a 7 x 6 grid.
The graph shows that the L1 distance between the points (2,6) and (5,2) is 7, the length of the shortest path that connects the two points. The L1 metric is sometimes called the "city block" or "taxicab" metric because it measures the distance along the grid instead of "as the bird flies," which is the Euclidean distance.

Although I don't know a formula for the expected distance on a discrete regular grid, the grids in my application were fairly small so this article shows how to compute all pairwise distances and explicitly find the average (expected) value. The DISTANCE function in SAS/IML makes the computation simple because it supports the L1 metric. It is also simple to perform computer experiments to show that among all grids that have N*M vertices, the grid that is closest to being square minimizes the expected L1 distance.

L1 distance on a regular grid

An N x M grid contains NM vertices. Therefore the matrix of pairwise distances is NM x NM. Without loss of generality, assume that the vertices have X coordinates between 1 and N and Y coordinates between 1 and M. Then the following SAS/IML function defines the distance matrix for vertices on an N x M grid. To demonstrate the computation, the distance matrix for a 4 x 4 grid is shown, along with the average distance between the 16 vertices:

For an N x M grid, the L1 diameter of the grid is the L1 distance between opposite corners. That distance is always (N-1)+(M-1), which equates to 6 units for a 4 x 4 grid.
As for the continuous case, the expected L1 distance is less than half the diameter. In this case, E(L1 distance) = 2.5.

A comparison of distances for grids of different aspect ratios

As indicated previously, the expected distance between two random vertices on a grid depends on the aspect ratio of the grid. A grid that is nearly square has a smaller expected distance than a short-and-wide grid that contains the same number of vertices. You can illustrate this fact by computing the distance matrices for grids that each contain 36 vertices. The following computation computes the distances for five grids: a 1 x 36 grid,
a 2 x 18 grid,
a 3 x 12 grid,
a 4 x 9 grid, and
a 6 x 6 grid.

The table shows that short-and-wide tables have an average distance that is much greater than a nearly square grid that contains the same number of vertices.
When the points are arranged in a 6 x 6 grid, the distance matrix naturally decomposes into a block matrix of 6 x 6 symmetric blocks, where each block corresponds to a row.
When the points are arranged in a 3 x 12 grid, the distance matrix decomposes into a block matrix of 12 x 12 blocks
The following heat maps visualize the patterns in the distance matrices:

Expected (average) distance versus number of rows

You can demonstrate how the average distance depends on the number of rows by choosing the number of vertices to be a highly composite number such as 5,040. A highly composite number has many factors. The following computation computes the average distance between points on 28 grids that each contain 5,040 vertices. A line chart then displays the average distance as a function of the number of rows in the grid:

Each point on the graph represents the average distance for an
N x M grid where NM = 5,040. The horizontal axis displays the value of N (rows).
The graph shows that nearly square grids (rows greater than 60) have a much lower average distance than very short and wide grids (rows less than 10). The scale of the graph makes it seem like there is very little difference between the average distance in a grid with 40 versus 70 rows, but that is somewhat of an illusion.
A 40 x 126 grid (aspect ratio = 3.15) has an average distance of 55.3; a
70 x 72 grid (aspect ratio = 1.03) has an average distance of 47.3.

Applications and conclusions

In summary, you can use the DISTANCE function in SAS/IML to explicitly compute the expected L1 distance (the "city block" distance) between random points on a regular grid. You can minimize the average pairwise distance by making the grid as square as possible.

City planning provides a real-world application of the L1 distance. If you are tasked with designing a city with N buildings along a grid, then the average distance between buildings is smallest when the grid is square.
Of course, in practice, some buildings have a higher probability of being visited than others (such as a school or a grocery). You should position those buildings in the center of town to shorten the average distance that people travel.

I usually create very technical graphs, that just focus on conveying the information in a concise and straightforward manner (no wasted colors, and nothing to distract you from the data). But sometimes, depending on your audience and the purpose of the graph, you might need to create a graph that [...]

I usually create very technical graphs, that just focus on conveying the information in a concise and straightforward manner (no wasted colors, and nothing to distract you from the data). But sometimes, depending on your audience and the purpose of the graph, you might need to create a graph that [...]

PC Magazine defines the broad industry term Software as a Service (SaaS) as, “Software that is rented rather than purchased. Instead of buying applications and paying for periodic upgrades, SaaS is subscription based, and upgrades are automatic during the subscription period.” SaaS, according to the same source, is ideally suited [...]

What is a continued fraction?

In this continued fraction, a0 is the integer portion of the number and the ai for i > 0 are positive integers that represent the noninteger portion.
Because all of the fractions have 1 in the numerator, the continued fraction can be compactly represented by specifying only the integers: x = [a0; a1, a2, a3, ...].
Every rational number is represented by a finite sequence; every irrational number requires an infinite sequence. Notice that the decimal representation of a number does NOT have this nice property: 1/3 is rational but its decimal representation is infinite.

Convert a finite continued fraction to a rational number

My article about the continued fraction expansion of pi contains a few lines of SAS code that compute the decimal representation of a finite continued fraction. Since every finite continued fraction corresponds to a rational number, you can modify the algorithm to compute the rational number instead of a decimal approximation.

The algorithm is straightforward. Given a vectora = [a0; a1, a2, ..., ak],
start at the right side, form the fraction sk = 1 / ak. Then move to the left and compute the fraction
sk-1 = ak-1 + sk. Continue moving to the left until all fractions are added.
In the SAS/IML language (or in the SAS DATA step), you can represent each fraction as a two-element vector where the first element is the numerator of the fraction and the second is the denominator of the fraction.
This leads to the following algorithm that computes a rational number from a finite continued fraction expansion:

Rational numbers to continued fractions

You can program the inverse algorithm to produce the cfe from a rational number. The algorithm is called the Euclidean algorithm and consists of finding the integer part, subtracting the integer part from the original fraction to obtain a remainder (expressed as a fraction), and then repeating this process. The process terminates for every rational number. In the SAS/IML language, the Euclidean algorithm looks like this:

Each column of the table shows the continued fraction representations of a rational number.
The output shows that the ContinuedFrac and EuclidAlgRational functions are inverse operations: each function undoes the action of the other.

The first property concerns the probability distribution of the ak for sufficiently large values of k. Notice that the continued fractions in this article mostly contain small integers such as 1, 2, 3, and 4. It turns out that this is generally true. For almost every irrational number in (0,1) and for k sufficiently large,
the k_th term ak is more likely to be a small integer than a large integer. In fact, the Gauss-Kuzmin theorem states that the distribution of ak (thought of as a random variable) asymptotically approaches the probability mass function P(ak = t) = -log2(1 - 1/(1+t)2), which is shown below for t ≤ 20. Thus for a random irrational number, although the value of, say, the 1000th term in the continued fraction expansion could be any positive integer, it is probably less than 20!

The second interesting mathematical property is Khinchin's Theorem, which states that for almost every irrational number in (0,1), the geometric mean of the first t digits in the continued fraction expansion converges to a constant called Khinchin's constant, which has the value 2.685452001.... For example, the first 97 terms in the continued fraction expansion of pi have a geometric mean of 2.685252.

Many of the most beautiful areas in the US are owned by the government, to preserve them and allow access for everyone to enjoy them. And most US schools are traditionally closed during the summer, which provides families a great opportunity to go visit state and federal lands (parks, forests, [...]