In every math book I’ve ever seen, the floor(⋯)
function rounds toward minus infinity (not toward zero). The same
goes for every non-spreadsheet computer language I’ve ever heard of
including imperative languages such as C++ and functional languages
such as lisp.

In contrast, alas, excel decided to
implement floor(⋯) as rounding toward zero. Other
spreadsheet programs have followed suit.

This is another huge trap for the unwary. I don’t know of any good
way to write code that is non-misleading. See also section 1.1.3.

In C++, the int(⋯) function (or simply assigning a
double into an int) rounds toward zero.

In lisp, it rounds toward minus infinity.

Spreadsheets round toward minus infinity, which makes them
consistent with lisp and inconsistent with C++. In particular, the
C++ int(⋯) function matches the spreadsheet
floor(⋯) ... and vice versa, so it is as confusing as it
could possibly be.

This topic is named in honor of the proverbial long straight fence
where there are N lintels held up by N+1 posts.

This is relevant in many different computing situations, especially
when an interval is divided into N subintervals (not necessarily all
the same size) demarcated by N+1 endpoints. In particular, this is
relevant to running sums, simple numerical integrals, and simple
numerical derivatives. Here are some suggestions:

Number the posts from 0 to N inclusive, for a total
of N+1 posts.

Number the lintels from 0 to N−1 inclusive, for a total of
N lintels. This is conventional in most modern computer languages.

As a corollary, this means that each lintel is named after the post at
the left end of the lintel, i.e. the lesser-numbered of the two posts
that hold that lintel.

Format all the lintel-values as subscripts, and all the post-values as
superscripts. This makes it easier to visualize the idea that the
lintels fall between the posts. Alas the contrast between subscripts
and superscripts is not as striking as one might have hoped, but it’s
better than nothing.

Beware that the ancient Romans mostly did not play by these rules.
For example, in music, there are only 7 notes per diatonic octave; the
eighth note is logically the beginning of the next octave. So
the name octave, which suggests eight, is inconsistent with modern
notions of how to count.

Similarly, there is no year zero. The year 1 AD immediately follows
the year 1 BC, even though this is inconsistent with modern notions of
how to construct a number line.

Beware that counting is different for a fence that forms a closed
loop. Then you have N lintels supported by only N posts. For
example, a circle can be divided into four quadrants, demarcated by
four points.

The foregoing applies to arrangements that are effectively
one-dimensional. In two or more dimensions, e.g. dividing a sphere
into octants, things get much more complicated.

Let’s assume you have column vectors, not row vectors. Put the
formula in the top cell of the result. Replace V$1 by the top of one
of your vectors (two places), and W$1 by the top of the other (two
places). Be sure to spell it with a $ sign in front of the
row-number. Then fill down into the remaining cells of the result.

In typical spreadsheets, curly braces are used to denote an array
constant, for example {1,2,3} is a horizontal array. Similarly,
{5;6;7} is a vertical array constant. Continuing down this road,
{11,12,13;21,22,23;31,32,33} is a 3×3 matrix constant. The
general rule is that commas are used to separate values on any given
row, while semicolons are used to separate rows.

Here are some example of how this can be used:

Cells: You can enter an array constant or array formula into an
array of cells. As an example of entering an array constant,
highlight cells A1:C1, then type ={1,2,3} and hit
<ctrl-shift-enter>. As an example of an array formula, highlight
cells A2:C2 an then type transpose(D1:D3) and hit
<ctrl-shift-enter>.

Linest: Consider using a spreadsheet for least-squares fitting
aka linear regression. Array constants can be used in connection with
the linest(...) function. This provides a concise way of
defining families of functions, such as the polynomial basis, the
Fourier basis, et cetera. This sometime results in a dramatic
reduction in the number of columns in the spreadsheet. An instance of
this can be found on the polynomial-fancy page of reference 1; for a discussion of this example see reference 2.

Range of numbers: There exists an idiomatic way of generating an
array of consecutive integers in an array formula. As an example,
here is a formula to calculate the sum of the first ten integers:

Hamming weight, Hamming distance, and parity: You might think
these functions would be built in, but they are not, so far as I can
tell. I wrote functions to do this. They make good use of the
aforementioned range-of-numbers idiom.

Integrals including simple convolutions: Suppose we have a
collection of 900 x-values, obtained perhaps by random sampling.
We we want to treat those as delta functions. We convolve them with
a Gaussian, to obtain a curve that is the sum of 900 Gaussians. We
do not want our spreadsheet to have 900 rows and 900 columns.
Indeed, it does not need to have 900 rows or 900 columns;
we can store all 900 x-values in a 30 by 30 region.

=sum(normdist(K10,AA$11:BD$40,K$4,0)*K$2)

Explanation: The normdist() function ordinarily returns a scalar. If
however you pass an array expression (constant or otherwise) as one
of its arguments, it returns an array. This is an ephemeral array,
not visible on the spreadsheet anywhere. In this example, the sum()
collapses the ephemeral array, producing a scalar that fits in a
single cell. Remember to enter the expression using
<ctrl-shift-enter>.

Hint: If you ever enter something as a scalar when you should have
entered it as an array, don’t panic. The formula is still there, and
can easily be promoted to array-formula status. Highlight the area
that is to be filled by the array, hit the <F2> key to move the focus to
the formula bar, and then hit <ctrl-shift-enter>.

As we have seen in the examples above, you can do arithmetic involving
array constants, for instance by highlighting cells A1:C1, then typing
in =10^{1,2,3} and hitting <ctrl-shift-enter>. This will put
the values 10, 100, 1000 into the three cells.

Obviously you can compute the maximum of these three cells using the
expression max(A1:C1), and store this result in a single cell
such as A3.

Logic suggests that you can do the same thing on the fly, by entering
the combined expression max(10^{1,2,3}) into a single cell such
as B3.

However, beware: Due to a bug in current versions of the gnumeric
spreadsheet program, and possibly other programs as well, if you try
in the normal way, you will get a numerically-wrong answer. See
reference 3.

As a workaround for this bug, you can get the right answer by entering
the formula =max(10^{1,2,3}) as an array formula using
<ctrl-shift-enter>. There is no good reason why <ctrl-shift-enter>
should be needed in this situation, because we are entering an
ordinary scalar-valued expression, equivalent to the expression in the
previous paragraph (which did not require <ctrl-shift-enter>).

This formula is meant to be filled-down and/or filled-across to cover
the destination region; it is not meant to be entered as an array
formula.

It would be super-nice to be able to express the reverse as an array
expression, but I have not found a way to do this. The index()
and offest() functions do not appear to tolerate array
expressions in their second or third arguments.

You can reverse rows and/or columns using the index() function
or the offset() function. For example, if you want a copy of
the data in columns I through K, with the columns in reverse order,
you can use the following. Note that index() starts counting
at 1, whereas offset() starts counting at 0. Also,
index requires an array as its first argument, while for
present purposes offset requires a single cell as its first
argument (since we want each call to offset to yield a single
cell).

To reverse columns:

=index($I2:$K2,1,1+column($K2)-column(I2))

or equivalently

=offset($I2,0,column($K2)-column(I2))

Fill down if there are multiple rows of interest in each column.

Similarly, to reverse rows:

=index(E$6:E$26,1+row(E$26)-row(E6),1)

or equivalently

=offset(E$6,row(E$26)-row(E6),0)

This turns out to be important in connection with convolutions.

Another important application arises in connection with a band plot
such as in figure 1. Ideally it would be
possible to make such a plot using the "fill to next series" feature
of the spreadsheet program, but in older versions of gnumeric that has
bugs. It is better to use the "fill to self" feature. This requires
tracing the top of each band in the normal order, then tracing the
bottom of each band in reverse order.

Suppose you have a row of data: 1, 2, 3, et cetera. You would like to
plot each data item twice. Usually the best way to do this is to put the
duplicates in adjacent rows as discussed in section 1.19.

However, sometimes you might want to keep everything in a single row,
and just have twice as many cells in the row, so that the row reads 1,
1, 2, 2, 3, 3, et cetera. This situation can easily arise if you want
to draw a stair-step function. The vertical risers require you to
duplicate the abscissas, and the horizontal treads require you to
duplicate the ordinates.

The index function can do this for you:

=index($I11:$W11,1,1+(column(I11)-column($I11))/2)

Note that stair-steps arise if you are plotting the cumulative
probability for a discrete distribution, such as in figure 2.

The column(M7) function returns the address of the given
column; in this example, “M” is column number 13. If however you
want the column letter, aka the column name formatted as a letter, you
have to work harder. Here’s the recommended expression:

=substitute(address(1,column(M7),4),"1","")

Explanation: The row number (“7”) here is irrelevant. The
column(M7) function returns 13, ignoring the row number. The
address(...,4) function returns an address, namely the text
string “M1”. The “4” here tells it to return a relative address
(i.e. without any $ signs). Lastly, we get rid of the “1”.

Here’s one rationale for doing something like this: Suppose you want
to format a message that refers to the column in which cell M7 was
found. Next, suppose you cut-and-paste cell M7 to a new location such
as N8. The result of the recommended expression will change
accordingly; the new result will be “N”. In other words, the
recommended expression is translation invariant – whereas a
simple string constant (in this case “M“) would not be.

In some cases there are simpler ways of achieving translation
invariance, as discussed in section 1.11.

Use the match(needle,haystack,0) to return a relative row
number. A result of 1 corresponds to the first row in the haystack
range. In particular, argmax is match(max(haystack),haystack,0) and argmin is match(min(haystack),haystack,0).

The result can then be used in the index(range,relRow,relCol)
function to fetch a value. Equivalently, it can be used in the
offset function, provided you subtract 1.

Usually the simplest and best approach to use the match()
function to create a reference to the cells you want.

It is a dubious practice to cobble up cell addresses by
concatenating non-constants such as the row-letter computed in
section 1.9. This can be made to work correctly, but it’s
usually messier than the previous suggestion.

It is an even worse practice to cobble up cell addresses by
concatenating string constants and then using the indirect()
function. The problem is that the constants do not “track” properly
if stuff gets relocated. That is to say, they are not translation
invariant.

Step 1: To get an array of indices, apply the following to the sort
key, to encode the key and the corresponding row number. Suppose the
input data starts at D$4 and we want the indices to start at A$4:

See section 1.14 for a discussion of the percentrank
function. Let N denote the number of valid elements. Multiplying
the prank by N−1 (and rounding) gives the rank as an integer (with
N possible values from zero to N−1 inclusive). We use the integer
rank as our sort key. The rest of the expression uses the fraction
part to encode the index, to remember where this element was before
sorting.

Step 2: Sort the indices. Let’s put the sorted results in column B:

=small(A$4:A$99,1+row(B4)-row(B$4))
or
=large(A$4:A$99,1+row(B4)-row(B$4))

Let N denote the number of valid elements in the haystack.
Blank cells and string-valued cells are ignored, and don’t contribute
to the value of N. See section 1.15 for a way of
evaluating N.

Suppose some element with value V is part of an M-way tie,
and suppose it corresponds to some prank we denote Rlow. Then
if the needle is infinitesimally larger than V, its prank is
Rhigh = Rlow + (M−1)/(N−1).

This makes sense insofar as it is 1/(N−1) less than the prank of
the next-larger element of the haystack, i.e. Rnext = Rlow +
M/(N−1)

If the needle falls between two elements of the haystack, the
prank is computed by linear interpolation, starting from Rhigh and
ending at Rnext, so as the needle varies over the interior of the
open interval, the prank changes by 1/(N−1). (The change is not
1/N, which would be a fencepost error.)

Therefore if there are no ties, the prank is a continuous
function of the needle-value. A stronger statement is: If and only
if there are no ties, not counting a possible tie for top rank, the
prank is a continuous function of the needle-value.

Sometimes you want to select a subset of the original data, and put it
into a new region without gaps. This is called a query. You
might want to run multiple queries against the same original data.

Suppose the original data is on page "x", in cells x!e6:j37. That’s a
total of N=32 rows. Columns A,B,C, and D are not needed on this page.
Optionally, they can be used for some unrelated purpose.

On the next page, use column A to keep a running sum of how many times
the selection criterion is met. There has to be a zero in cell A5.

=A5+1e-06+(left(x!I6,2)="**")

where the stuff in parentheses is the criterion; it evaluates to 1
whenever the criterion is met, and 0 otherwise. Note the 1e-6 which
is necessary to make sure the elements in this column are strictly
sorted, as required by vlookup(). This is not an array
formula. Fill down so there are N rows of this.

As another example, suppose you just want to select rows that
contain valid numerical data in column I:

=A5+1e-06+(count(x!I6))

Then in the next column, set up sequence numbers, starting with -1 in
cell B5. In gnumeric, this column is not needed, but if you want the
spreadsheet to be XL compatible it is needed.

Then in the next column, find where each successive match occurs.
This returns a 0-based index into the original data.

=1+vlookup(row()-row(B$6)+0.5,A$5:B$37,2,1)

The second argument to vlookup has the column to be searched and the
return value. The next argument says that the return value comes from
the second column, which in this case is the array of sequence
numbers.

Then in the next column, be defensive about running off the end of the
original data region. You can hide most of the rows with "xxx" in
them, although it might be smart to leave one of them, to demonstrate
that you are not inadvertently hiding any results.

=if($B6<rows(x!$D$6:$D$37),$B6,"xxx")

Finally, the offset() function can be used to grab the desired
subset of the data:

When plotting, in the simplest case, the ordinate of the plot is a
one-dimensional spreadsheet region, by which I mean a region that is
either N rows by 1 column, or 1 row by N columns. (Ditto
for the abscissa, if any.)

However, it is perfectly possible for the ordinate (and/or abscissa)
to be a rectangular region, with M rows and N columns. Each row
is read out before advancing to the next row. For example, the region
A1:C10 contains 30 elements, in the following order A1,
B1, C1, A2, ⋯ B10, C10.

It is reasonably straightforward to plot a set of arbitrary line
segments (disconnected from each other). Similarly it is reasonably
straightforward to plot a set of arbitrary triangles or other
polygons.

The basic idea is to plot them as an ordinary “XY lines” plot. To
separate one item from the next, put a non-numeric value in the list
of abscissas and/or ordinates. A blank will do, it is more readable
if you use a string such as “break” or “X”.

This idea combines nicely with the rectangular regions discussed in
section 1.17. To plot N line segments, use three columns
of N rows for the abscissas (tail.x, tip.x, and “break”) plus
three columns for the ordinates (tail.y, tip.y, and “break”).

Similarly, to plot N quadrilaterals, use six columns of N rows;
the first four define the corners, the fifth returns to the starting
point to close the last side, and the sixth is the “break”. An
example is shown in figure 3.

For a discrete distribution, such as we get from rolling a die,
cumulative probability is a stairstep function. Suppose there are N
steps. Depending on details of the desired appearance, this will
correspond to either N treads and N+1 risers, or perhaps N+1
treads and N risers. Either way, there is a nice representation
using N+1 rows of four columns: two columns for the abscissas, and
another two columns for the ordinates. Each row represents one tread:
It has two unequal abscissas (the left edge and the right edge of the
tread) and two equal ordinates (since the tread is horizontal).

In order to understand what gets plotted in front of what, it helps to
know some terminology. However, the terminology for graphical objects
is maddening. You might imagine that words like graph, chart, and
plot should mean pretty much the same thing ... but in this context
they don’t. I write these words in small caps to indicate that they
are codewords.

Gnumeric doesn’t even use its own terminology consistently. The popup
for customizing a graph and its contents is entitled «Customize
Chart» whether or not the graph contains zero, one, or many charts.

Here is a subset of what is going on, even when you try to do
something simple. Suppose we are making an XY plot of some data:

The top-level object is called a graph.

The graph can have zero or more graph-titles.

The graph can have zero or more charts.

Each chart can have zero or one backplane.

Each chart can have zero or more legends.

Each chart can have zero or more chart-titles.

Each chart that contains an XY plot can have one or more
X-axes. The first X-axis cannot be deleted, and controls the
X-scale factor for all plots (and their series) on this
chart. Additional X-axes, if any, are basically just
ornamental.

Each X-axis can have zero or one major grid.

Each X-axis can have zero or one minor grid.

Each X-axis can have zero or one axis-label.

Ditto for the Y-axes.

Each chart can have zero or more plots of
axis-compatible type. Axis-compatibility means you can add an
XY plot on top of an XY plot, but you cannot add a Radar
plot on top of an XY plot or vice versa. Remember that
one set of axes controls every plot within the chart. You
might imagine that the type of the chart would be a property
of the chart itself, but no, it is controlled by the type of
the first plot within the chart.

In contrast, the size and position of the plots is
controlled by properties of the parent chart – not by the
axes, and not by the plots themselves. Don’t ask me to
explain this.

Each plot can have zero or more series.

∗ Each series can display data using lines.

∗ Each series can display data using pointlike symbols.

∗ Each series can display data using shaded regions.

Now, you might imagine that the drawing routines would recurse through
the objects pretty much as shown above. In particular, you might
imagine that gnumeric would finish drawing one series before going on
to the next. However, that’s not how it actually works. The
innermost loops are done “inside out” as explained below. This
is completely undocumented, indeed contrary to the documentation.

In English, that means that within each plot, the lines (for all
the series) are drawn in order, and then the symbols (for all the
series) are drawn in order. All of the lines wind up behind all of
the symbols. The stacking of the lines (relative to other lines) is
controlled by the order of the series as they appear in the user
interface, and similarly for the stacking of the symbols (relative to
other symbols). However, within a plot, there is no way to draw a
line on top of a symbol ... even if you have one series that uses
only lines and another that uses only symbols. No amount of
re-ordering the series within a plot will change this.

This ordering may not be what you want. (An example arises when
plotting arrowheads on top of target symbols, as discussed in
section 1.21.) In such a case, the only way to make
progress is to use more than one plot, and order the plots
appropriately. You can achieve quite fine control over the
“stacking” – i.e. the order of plotting – by putting each series
in its own plot. Use one series per plot, i.e. one plot per
series.

For even finer control of the plot order, don’t rely on a single
series to draw both lines and symbols. Instead, split it into two
series looking at the same data. One series displays the lines, while
the other series displays the symbols.

This business of creating more than one plot would be a lot more
practical if it were possible to move a series from one plot to
another, but evidently that is not possible. The buttons that move a
series up and down in plot order get stuck at the boundary between
plots.

While we’re on the subject: What would help most of all is the ability
to copy-and-paste a series ... and also copy-and-paste an entire plot
(along with all its series). This includes being able to copy a
series from one plot and paste it into another. The ability to
copy-and-paste (not just cut-and-paste) has innumerable applications.
It is very common to want a new curve that is just like an old curve,
with minor modifications.

Data space is abstract. In (say) a plot of force versus time,
data space might have units of newtons in one direction and seconds
in the other direction.

Screen space is direct and concrete. It is the physical
distance on the screen. It might have units of cm (or pixels) in
both directions. Crucially, the units are the same in all
directions.

There are some things such as arrowheads that clearly need to be
plotted in screen space, not data space. You want the arrowhead shape
to be preserved, even if it is rotated by some odd angle. The idea of
rotation makes sense in screen space, but violates basic requirements
of dimensional analysis in data space, where there is not (in general)
any natural metric, i.e. not any natural notion of distance or angle.

In figure 4, the extent of the plot in the x direction
is more than fivefold larger than the extent in the y direction.
The arrowheads would be grossly distorted if we tried to draw them in
data space without appropriate correction factors.

Although typical spreadsheet apps do not provide easy access to screen
space, we can work out the required transformations. The first step
is to figure out the metric. We know the metric in screen space, and
we use this to construct a conversion metric for data space. Let’s
denote data space by (x, y) and screen space by (u, v).

It must be emphasized that the arc length ds is the same in both
equation 3 and equation 4. It is the real
physical arc length on the screen.

You can determine the coefficients a and b as follows: Lay out a
rectangle on the chart. Let the edges of the rectangle be
(Δu, Δv) in screen space, and (Δx, Δy)
in data space. Then the needed coefficients are:

The conversion metric defines a notion of length and angle in data
space. Given a line segment in data space, we can construct a tangent
unit vector with the same orientation ... where “unit” refers to its
length in screen space. We can also define a transverse unit
vector, perpendicular to the line segment ... where “perpendicular”
refers to its angle in screen space.

It is then straightforward to construct arrowheads and other objects
in terms of the tangent unit vector and the transverse unit vector.

If you change the size of the chart or rescale the axes within a
chart, you will have to recompute the a and b coefficients in
accordance with equation 5. This is a pain in the neck
if the axis bounds are being computed automatically.

Before you start plotting arrowheads, be sure to read section 1.20. You may want to use the one-plot-per-series construction
for your vectors and arrowheads.

The spreadsheet code to do this is cited in reference 4.
In this example, the plotting area is square, and the edge-length of
the plotting area is taken as the unit of distance in screen space.
The size of the arrowheads is 1/20th of this distance. This is of
course only an example; any other distance could be used in accordance
with equation 5.

There are several ways to see the formula aka expression in a
cell (as opposed to the value obtained by evaluating the expression).

The ctrl-‘ key toggles between ordinary show-value mode
and show-formula mode. Hold the control key while hitting
grave-accent key (which is typically just to the left of the “1”
key).

The get.formula(X1) function will display the formula
in cell X1, including the initial “=” sign. Note that if the
cell contains a constant such as 123 (as opposed to a
formula such as “=123”) the get.formula() displays
nothing.

I use this a lot. It is useful in pedagogical situations, where
you want people to be able to see the formula and the value at the
same time. It is also useful as a way to put formulas and/or
addresses into a .csv file, which (as the name suggests) nominally
only includes values. There are lots of good reasons why you might
want formulas and/or addresses in .csv files.

The expression(X1) function is the same, except that it
does not display the initial “=” sign.

Clicking on any given cell displays the cell’s formula
in the formula bar.

Double-clicking on any given cell displays the cell’s
formula in-place (as well as in the formula bar).

The default format that gnumeric uses for storing spreadsheet
documents is structured as XML. Often there is gzip compression on
top of that, but uncompressing it is super-easy.

(I have no idea what format excel uses.)

It is fairly easy to figure out what the XML means, and then to munge
it using a text editor, or using tools such as grep, awk, perl, et
cetera.

For example, suppose you want to have a plot with 35 curves on it.
Configuring all of those using the point-and-click interface is
unpleasant the first time, and even more unpleasant if you want to
make a small change and then propagate it to all the clones.
Therefore I wrote a little perl program that take one curve (aka
Series) as a template and then makes N clones.

The offset() function is used to select a subset of the data.
There are 1000 raw data points, and normally you would want to use
them all, but just for fun in this special case only 250 are used to
calculate the histograms.

The histogram in figure 6 uses yet
another trick: The nominal ordinate of the plot is zero. The
histogram bar you see is the upper error bar.
This technique seems cute at first glance, but it has a fatal flaw:
The plot uses no markers and no line-code, so the trace does not show
up in the legend in any reasonable way.

Therefore it is better to use the approach seen in figure 7 and figure 8. They
plotted the histogram as a bunch of quadrilaterals, using the
techniques mentioned in section 1.18.

I am quite aware that the spreadsheet program has a built-in
facility for producing histograms. It is OK in simple situations,
but I continually find myself wanting to do something it can’t
handle. The techniques in this example are just as easy to use, and
much more flexible.

The cumulative probability distribution is calculated using an
array formula, which is summed in place. This formula must be
entered using control-shift-enter.

For a continuous probability distribution, the right approach is to
sort the data. Then the index (suitably normalized) is the
cumulative probability, as a function of the data value. That is to
say, there is a riser at every data point, and a tread between each
two data points. See section 2.2 and
reference 7.

For discrete data, with relatively few categories, sorting is
inefficient and unnecessarily complicated.

For plotting the cumulative probability distributions, the following
statement comes in handy:

=if(J8<1,small(offset($E$8,0,0,G$4),1+row($E8)-row($E$8)),9e+99)

Start with the raw data, a bunch of x-values, randomly
drawn from some distribution.

In a new column, sort them in order of increasing x,
using the techniques set forth in section 1.12.

The cumulative probability distribution is a stairstep function,
in which event corresponds to a riser. For instance, the first event
in the list corresponds to the riser going from count=0 to count=1.
If there are N events, we normalize this so that the first riser
goes from P=0 to P=1/N.

Use an offset() statement so we can sort only
a subset of the data if we wish.

Use an if() statement so that we can achieve
the proper asymptote P(x) = 1 for large x.

There are no adjustable parameters. The empirical distribution
of the data converges to the ideal distribution with no help from us.

As for the probability-density histogram:

There is an adjustable parameter here, namely the bin-width, i.e. the
extent of the bin in the x-direction. If we have a huge number of
points, the bin-width can be smaller.

The histogram is a stairstep function, in which each bin
corresponds to a tread.

The support of each bin is a half-open interval of the form [L, R)
such that the bin contains all events such that x is greater than or
equal to L and strictly less than R. For continuous data, the
chance that any data point falls exactly on a bin boundary is zero,
but for discrete data this can become an issue, hence the careful
definition of the bin-boundaries.

If the bin-width gets too small and there are not enough data points,
the histogram becomes noisy. At some point a noisy histogram becomes
hard to interpret; among other things, the idea of half-width at
half-maximum becomes useless.

Here is a statement that produces a random integer according to a
binomial distribution. In this example, there are 60 trials. (The
number of trials is built into the statement.) The first argument to
the binomdist function is the outcome. Here we use an array
expression, using the range-of-numbers idiom mentioned in
section 1.4. The second argument is the number of
trials. The third argument is the probability of success per trial,
which we assume is stored in cell $D$3. We use the
sum function to count how many times the probability of a given
outcome was less than the threshold, where the threshold is some
random number stored in cell G7.

=sum(if(binomdist(row($A$1:$A$61)-1,60,$D$3,1)<G7,1,0))

Suppose we use that formula 200 times, using a different random
threshold each time. The results are shown in the following figures.
See reference 8 for details.