A Note on the Information-Theoretic Basis for Fitts' Law

Abstract:
Fitts' law is an information-theoretic view of human motor
behavior developed from Shannon's Theorem 17, a fundamental
theorem of communications systems. Using data from Fitts'
original experiment, we demonstrate that Fitts' choice of
an equation that deviates slightly from the underlying
principle is perhaps unfounded, and that the relationship
is improved using an exact adaptation of Shannon's
equation.

FITTS' RESEARCH into the information capacity of the human
motor system culminated in the publication of a paper (Fitts,
1954) that proposed a fundamental relationship. It expresses
movement time (MT ), the time to complete a movement task, in
terms of the distance or amplitude of the move (A ) and the
width of the region within which the move must terminate (W ).
The mathematical relationship, known as Fitts' law, is

MT = a + b log2(2A / W )

(1)

The so-called index of difficulty (ID ) of a movement task is
expressed as

ID = log2(2A / W)

(2)

and allows Fitts' law to be restated as

MT = a + b ID.

(3)

Equations 1 and 3 are linear in ID, with empirically determined
constants for the intercept (a ) and slope (b ). Although the
base of the log term is arbitrary, practice dictates the use
of base 2; thus ID carries the units bits. The number of bits
is the information content of the positioning task, or, stated
another way, the information transmitted in carrying out the
task.

Fitts' paper provided data for four movement task experiments
that substantially verified the model's appropriateness.
Numerous other motor behavior experiments have also demonstrated
a high correlation between Fitts' relationship and observed data.

This paper is not a thorough examination of Fitts' law. For
that the reader is referred to Welford (1968, pp. 145-160) or
to Crossman and Goodeve (1983), Kvalseth (1979, 1981), and
Sheridan (1979) for more critical reviews.

Fitts' law was developed from an analogy with physical
communication systems. In such systems, the amplitude of a
transmitted signal is described as perturbed by noise that
results in amplitude uncertainty. The effect is to limit
the information capacity of a communications channel to some
value less than its theoretical bandwidth. Shannon's
Theorem 17 expresses the effective information capacity C
(in bits × s-1) of a communications channel of band B (in
s-1) as

C = B log2( (P + N ) / N )

(4)

where P is the signal power and N is the noise power (Shannon
& Weaver, 1949, pp. 100-103).

It is the purpose of this note to suggest that Fitts' model
contains an unnecessary deviation from Shannon's Theorem 17
and that a model based on an exact adaptation provides a
better fit with empirical data. The variation of Fitts'
law suggested by direct analogy with Shannon's Theorem 17
is

MT = a + b log2( (A + W ) / W )

(5)

It is revealing to examine the source Fitts cites in his
paper at the point where he introduces the relationship
(Fitts, 1954, p. 368). His derivation is based on Goldman's
Equation 39 (Goldman, 1953), which is similar to Fitts' law
except in its use of the terminology of communications
systems:

C = B log2(P / N )

(6)

Goldman (1953) offers this equation as an "approximation" of
Shannon's theorem, adding that it is useful "if the transmitted
power is large in comparison with the noise" (p. 157).
Indeed, this is so. When a substantial signal is transmitted
along a low-noise channel, Equation 6 is an accurate and less
cumbersome substitute for Equation 4. As the signal decreases
or the noise increases, however, Equation 6 becomes increasingly
inaccurate and Equation 4, Shannon's Theorem 17, must be used.
Psychomotor experiments employing Fitts' law commonly used
conditions in which the signal (movement amplitude) is very
small in comparison with the noise (target width). In fact,
two of Fitts' experiments used conditions extending down to
an A:W ratio of 1:1! These are precisely the conditions under
which Goldman cautions that his equation (and, we might
conjecture, Fitts' law) is inaccurate.

Fitts recognized that his analogy was imperfect. The "2"
was added (see Equation 1) to avoid a negative ID when A =
W ; however, log2(2A / W ) is zero when A
= (W&nbsp/ 2) and
negative when A < (W / 2). These conditions could never
occur in the experiments Fitts devised. Other researchers,
however, have reported experimental conditions with ID less
than 1 bit (Drury, 1975), or with a negative ID (Crossman
& Goodeve, 1983; Ware and Mikaelin, 1987). It is noteworthy
that, in the model based on Shannon's theorem (see Equation 5),
ID cannot be negative.

The reason Fitts did not use Shannon's original equation
was not stated. It may lie, however, in the greater facility
in working with Equation 1. Fitts' law may be recast as

MT = a + b1 log2(A )
− b2 log2(W ).

(7)

This appealing and accurate transformation separates A and
W and offers an extra degree of fine tuning for the prediction
model by using three empirically determined constants. This
approach has been studied by Kerr (1974), Sheridan (1979),
and Welford (1968, p. 153). However, it may be inappropriate
from an information-theoretic perspective because similar
recasting is not possible with Equation 5, which mimics
Shannon's original equation. Indeed the difference between
Equation 1 and 5 is most apparent for small values of ID (i.e.,
as the ratio A:W decreases); and it is for small values of ID
that Fitts' law has been demonstrated to fail (Buck 1986;
Crossman & Goodeve, 1983; Klapp, 1975; Langolf & Foulke, 1976;
Welford, 1968, pp. 145-146).

It has not been proposed in the literature subsequent to the
appearance of Fitts' original paper that an exact adaptation
of Shannon's equation may be more appropriate. Numerous other
variations of Fitts' law have been proposed over the years,
however, including those by Welford (1960); Beggs, Graham,
Monk, Shaw, and Howarth (1972); Kvalseth (1980); Kantowitz
and Knight (1978, pp. 222-223); and Jagacinski, Repperger,
Ward, and Knight (1980). Welford's proposal has been the
most favorably received, and usually takes the form

MT = a + b log2( (A / W ) + 0.5).

(8)

Equation 8 goes "half way" to the proposed variation and
can be restated as

MT = a + b log2( (A + 0.5 W ) / W )

(9)

to more closely illustrate the similarity with Shannon's
original theme.

Table 1 contains a comparison of the correlation coefficients
and regression line intercepts resulting from a least squares
regression analysis using the Fitts, Welford, and Shannon models.
The comparison employed the data from Fitts' reciprocal tapping
experiments (1 oz stylus and 1 lb stylus), disc transfer experiment,
and pin transfer experiment. The data provided in Fitts' original
experiments are thought to be particularly valid since 16 subjects
were used and were tested over 16 IDs. Each movement time recorded
was the average of more than 600 observations (Fitts, 1954).

TABLE 1
Correlation Coefficients and Regression Line Intercepts (ms) for
Three Variations of Fitts' Law Based on Fitts' (1954) Experiments

Model

Equation

Tapping (1 oz)

Tapping (1 lb)

Disc transfer

Pin transfer

ra

Intercept

ra

Intercept

ra

Intercept

ra

Intercept

Fitts

1

.9831

+12.8

.9796

−6.2

.9186

+150.0

.9432

+22.3

Welford

8

.9900

+65.3

.9874

+51.7

.9191

+231.8

.9443

+96.1

Shannon

5

.9936

+27.7

.9916

+9.7

.9195

+223.4

.9452

+84.4

ap < .001

Examining first the tapping experiment using a 1 oz stylus, a high
correlation existed using Fitts' relationship (r = .9831). It was
improved, however, by using Welford's equation (r = .9900), and
improved still further using Equation 5 (r = .9936), which was
based directly on Shannon's Theorem 17. The trend was similar
with the other three experiments.

Ideally, the intercept should have been (0.0) predicting 0 ms to
complete a task of zero difficulty. As evident in Table 1, Fitts'
relationship yielded the intercept closest to the origin in each
experiment, with the Shannon and Welford models ranked second and
third in each experiment. A possible explanation for the nonzero
intercepts stems from Crossman's observation (Crossman & Goodeve,
1983, p. 253) that movement time appears to approach a constant as
ID gets small. As for the one negative intercept in Table 1,
sampling error was perhaps the cause.

A proper information analysis of a Fitts' law experiment must
investigate the extent of noise in the subjects' execution of
the positioning tasks. A standard method was provided by Crossman
and given by Welford (1968, pp. 147-148) to convert target width
(W ) to "effective" target width (We ) in experiments such as Fitts
that record errors. In the long run, the subjects' dispersion of
hits forms a Gaussian or normal distribution (Fitts, 1954; Crossman
& Goodeve, 1983). The effective target width, analogous to "noise",
is the width corresponding to the central 96% of the distribution.

Since Fitts reported percentage errors (for the tapping experiments),
a simple transformation of W to We was obtained
by multiplying W by
a ratio of z scores. Table 2 shows the results of a reanalysis of
Fitts' tapping experiments in which We was substituted for W. The
correlations were in the same order as earlier, with the Shannon
model providing the highest correlation for both experiments. The
intercepts were generally negative, with the Welford model
providing the intercepts closest to the origin in each case, and
the Fitts model the one farthest from the origin.

An important question to raise is whether the difference between
the correlations was statistically significant: Was the higher
correlation demonstrated with Shannon's Theorem 17 due to chance?
This was tested with Hotelling's t test for the difference between
coefficients of correlation for correlated samples (e.g., Guilford
& Fruchter, 1978, p. 164; Bruning & Kintz, 1977, pp. 215-217).
The t statistic was calculated using the Fitts and Shannon
correlations of movement time with ID (r12 and
r13), the
intercorrelation between the Fitts and Shannon IDs (r23),
and the sample size n. The data in Table 3 indicated a
statistically significant difference between the Fitts and
Shannon correlations for the tapping experiments when using
W (p < .001) or We (p < .05, p < .02)
but no significant
difference between the correlations in the disc and pin
transfer experiments.

The statistical insignificance in the difference between the rs
in the latter cases was perhaps due to the nature of the tasks.
The tapping tasks were highly ballistic, whereas the disc and
pin transfer tasks were more regulated by feedback mechanisms.
The variations in the terminating positions for the tapping
tasks were analogous to noise and were recorded as percentage
errors when the subjects missed the targets. Errors, however,
could not occur in the disc or pin transfer task; the subjects
simply took the time necessary to guide the disc or pin until
it was secured in place. The extra time spent in the final
placement of the disc or pin may also explain the generally
higher intercepts for those experiments (see Table 1).

TABLE 3
Test of Statistical Significance Between Correlations
for the Fitts and Shannon Models

Correlation Coefficientsa

Range forFitts' ID(bits)

Hotelling'sDifference testb

Experiment

Fittsr12

Shannonr13

Inter-corrr23

t

n

p

.9831

.9936

.9966

1-7

6.47

16

.001

Tapping, 1 oz stylus, using W

.9796

.9916

.9966

1-7

6.94

16

.001

Tapping, 1 lb stylus, using W

.9185

.9195

.9999

4-10

.64

16

-

Disc-transfer

.9432

.9452

.9997

3-10

1.07

20

-

Pin-transfer

.9904

.9937

.9987

1-7

2.20

16

.05

Tapping, 1 oz stylus, using We

.9882

.9925

.9987

1-7

2.83

16

.02

Tapping, 1 lb stylus, using We

ap < .001bdf = n - 3, two-tailed test

Another possible explanation, which is particularly important
in establishing the viability of the Shannon model over the
Fitts model, stems from the range of experimental conditions
employed. The tapping experiments had an ID range of 1-7 bits,
whereas the disc transfer experiment had an ID range of 4-10
bits and the pin transfer experiment 3-10 bits (see Table 3).
As mentioned earlier, the difference between the two models is
likely to surface only when experimental conditions include small
values of ID.

These results are thought to be significant because the proposed
variation of Fitts' law is one that simplifies the law, returning
it to the underlying theory. By maintaining the premise that
human motor behavior can be modeled from an information-theoretic
perspective, Shannon's original and unaltered theorem, as expressed
in Equation 5, provides an appealing and perhaps appropriate model.
A reanalysis of data from Fitts' original experiment appears to
support this suggestion.

ACKNOWLEDGMENT

The paper is based on research conducted at the Ontario Institute
for Studies in Education, Toronto, Ontario.